Fix two bugs related to the interaction of weakrefs and the garbage
collector. The weakrefs in thetp_subclasses dictionary are needed in
order to correctly invalidate type caches (for example, by calling
PyType_Modified()). Clearing weakrefs before calling finalizers causes
the caches to not be correctly invalidated. That can cause crashes since the
caches can refer to invalid objects. This is fixed by deferring the
clearing of weakrefs to classes and without callbacks until after finalizers
are executed.

The second bug is caused by weakrefs created while running finalizers. Those
weakrefs can be outside the set of unreachable garbage and therefore survive
thedelete_garbage() phase (wheretp_clear() is called on objects).
Those weakrefs can expose to Python-level code objects that have had
tp_clear() called on them. SeeGH-91636 as an example of this kind of
bug. This is fixed be clearing all weakrefs to unreachable objects after
finalizers have been executed.

Issue:Segmentation fault, possibly due to a GC issue (tp_subclasses) #135552

Make the GC clear weakrefs later.

42abb05

Clear the weakrefs to unreachable objects after finalizers are called.

bedevere-appbot mentioned this pull request

Jul 1, 2025

Segmentation fault, possibly due to a GC issue (tp_subclasses)#135552

Open

Remove inaccurate comment.

17a4f9e

Copy link

Contributor

neonene commentedJul 1, 2025

I can confirm this PR fixes thegh-132413 issue as well.

Run clear_weakrefs() with world stopped.

12f0b5c

Copy link

MemberAuthor

nascheme commentedJul 1, 2025

I think this fixes (or mostly fixes)gh-91636 as well.

nascheme mentioned this pull request

Jul 1, 2025

Assertion failure when func_repr is called on an already tp_clear-ed object#91636

Closed

nascheme requested a review frompablogsal

July 1, 2025 23:33

nascheme added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section label

Jul 1, 2025

Copy link

bedevere-bot commentedJul 1, 2025

🤖 New build scheduled with the buildbot fleet by@nascheme for commit12f0b5c 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136189%2Fmerge

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

bedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section label

Jul 1, 2025

Copy link

MemberAuthor

nascheme commentedJul 2, 2025

This introduces refleaks, it seems. One of the leaking tests:
test.test_concurrent_futures.test_shutdown.ProcessPoolSpawnProcessPoolShutdownTest.test_shutdown_gh_132969_case_1
My unconfirmed suspicion is that a finalizer is now resurrecting an object via a weakref. Previously that weakref would be cleared before the finalizer is run. The multiprocessing finalizer logic seems very complicated. :-/

Copy link

MemberAuthor

nascheme commentedJul 2, 2025

This is the smallest leaking example I could make so far. Something withProcessPoolExecutor leaking maybe.

class LeakTest(unittest.TestCase):    @classmethod    def _fail_task(cls, n):        raise ValueError("failing task")    def test_leak_case(self):        # this leaks references        executor = futures.ProcessPoolExecutor(                max_workers=1,                max_tasks_per_child=1,                )        f2 = executor.submit(LeakTest._fail_task, 0)        try:            f2.result()        except ValueError:            pass        # Ensure that the executor cleans up after called        # shutdown with wait=False        executor_manager_thread = executor._executor_manager_thread        executor.shutdown(wait=False)        time.sleep(0.2)        executor_manager_thread.join()    def test_leak_case2(self):        # this does not leak        with futures.ProcessPoolExecutor(                max_workers=1,                max_tasks_per_child=1,                ) as executor:            f2 = executor.submit(LeakTest._fail_task, 0)            try:                f2.result()            except ValueError:                pass

Copy link

Contributor

neonene commentedJul 2, 2025•
edited
Loading

Other leaking examples (on Windows):

1. test_logging:

importloggingimportlogging.configimportlogging.handlersfrommultiprocessingimportQueue,ManagerclassConfigDictTest(unittest.TestCase):deftest_multiprocessing_queues_XXX(self):config= {'version':1,'handlers' : {'spam' : {'class':'logging.handlers.QueueHandler','queue':Manager().Queue()  ,# Leak# 'queue': Manager().JoinableQueue()  # Leak# 'queue': Queue(),                   # No leak                },            },'root': {'handlers': ['spam']}        }logger=logging.getLogger()logging.config.dictConfig(config)whilelogger.handlers:h=logger.handlers[0]logger.removeHandler(h)h.close()

2. test_interpreters.test_api:

importcontextlibimportthreadingimporttypesfromconcurrentimportinterpretersdeffunc():raiseException('spam!')@contextlib.contextmanagerdefcaptured_thread_exception():ctx=types.SimpleNamespace(caught=None)defexcepthook(args):ctx.caught=argsorig_excepthook=threading.excepthookthreading.excepthook=excepthooktry:yieldctxfinally:threading.excepthook=orig_excepthookclassTestInterpreterCall(unittest.TestCase):deftest_call_in_thread_XXX(self):interp=interpreters.create()call= (interp._call,interp.call)[1]# 0: No leak, 1: Leakwithcaptured_thread_exception()as_:t=threading.Thread(target=call,args=(interp,func, (), {}))t.start()t.join()

UPDATE:

importweakref,_interpreterswd=weakref.WeakValueDictionary()classTestInterpreterCall(unittest.TestCase):deftest_call_in_thread_XXX(self):id=_interpreters.create()wd[id]=type("", (), {})_interpreters.destroy(id)

neonene mentioned this pull request

Jul 2, 2025

gh-132413: Clear weakref to _datetime after modules are finalized#136152

Closed

Copy link

MemberAuthor

nascheme commentedJul 2, 2025

The majority (maybe all) of these leaks are caused by theWeakValueDictionary used asmultiprocessing.util._afterfork_registry. That took some digging to find. I'm not yet sure of a good fix for this. Explicitly cleaning the dead weak references from the.data dict works but it not too elegant.

Defer weakref clears only for refs to classes.

2f3daba

This avoids breaking tests while fixing the issue with tp_subclasses. Inthe long term, it would be better to defer the clear of all weakrefs,not just the ones referring to classes.  However, that is a moredistruptive change and would seem to have a higher chance of breakinguser code.  So, it would not be something to do in a bugfix release.

Copy link

MemberAuthor

nascheme commentedJul 3, 2025

The majority (maybe all) of these leaks are caused by the WeakValueDictionary used as multiprocessing.util._afterfork_registry. That took some digging to find. I'm not yet sure of a good fix for this. Explicitly cleaning the dead weak references from the .data dict works but it not too elegant.

Nope, that doesn't fix all the leaks. And having to explicitly clean the weakrefs from the WeakValueDictionary really shouldn't be needed, I think. TheKeyedRef class uses a callback and so they should be cleaned from the dict when the referred value dies. So, I'm not exactly sure what's going on there.

For the purposes of having a fix that we can backport (should probably be backported to all maintained Python versions), a less disruptive fix would be better. To that end, I've changed this PR to only defer clearing weakrefs to class objects. That fixes thetp_subclasses bug but should be less likely to break currently working code.

nascheme added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section label

Jul 3, 2025

Copy link

bedevere-bot commentedJul 3, 2025

🤖 New build scheduled with the buildbot fleet by@nascheme for commit2f3daba 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136189%2Fmerge

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

bedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section label

Jul 3, 2025

Ensure weakrefs with callbacks are cleared early.

123bc25

We need to clear those before executing the callback.  Since thisensures they can't run a second time, we don't need_PyGC_SET_FINALIZED().  Revise comments.

Copy link

MemberAuthor

nascheme commentedJul 3, 2025

The KeyedRef class uses a callback and so they should be cleaned from the dict when the referred value dies. So, I'm not exactly sure what's going on there.

Ah, theKeyedRef callback requires that the weakref is cleared when it is called, otherwise it was not deleting the item from the weak dictionary. So we need to clear the weakrefs with callbacks before executing them. That fixes the refleaks, I believe.

I revised this PR to be something that is potentially suitable for backporting. To minimize the behaviour change, I'm only deferring the clear of weakrefs that refer to types (in order to allow tp_subclasses to work) or with callbacks. I still have an extra clearing pass that gets done after the finalizers are called. That avoids bugs like#91636.

If this gets merged, I think we should consider deferring clearing all weakrefs (without callbacks) until after finalizers are executed. I think that's more correct since it gives the finalizers a better opportunity to do their job.

sergey-miryanov reviewed

Jul 3, 2025

View reviewed changes

Python/gc.cShow resolvedHide resolved

sergey-miryanov reviewed

Jul 3, 2025

View reviewed changes

Python/gc.cShow resolvedHide resolved

sergey-miryanov reviewed

Jul 3, 2025

View reviewed changes

Python/gc.c

		* not be cleared so that caches based on the type version are correctly
		* invalidated (see GH-135552 as a bug caused by this).
		*/
		clear_weakrefs(&final_unreachable);

Copy link

Contributor

sergey-miryanovJul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ough! Nice trick withfinal_unreachable!

pablogsal self-assigned this

Jul 3, 2025

nascheme added3 commits

July 3, 2025 06:04

Add NEWS.

496c0b1

Add comment about wrlist iteration.

8a553d1

This is a bit trickly because the wrlist can be changing as weiterate over it.

Merge 'origin/main' intopythongh-135552-wr-clear-later

01c882f

StanFromIreland mentioned this pull request

Jul 5, 2025

gh-132413: Check if datetime current state is not NULL#136321

Closed

Copy link

Contributor

sergey-miryanov commentedJul 7, 2025

TheKeyedRef class uses a callback and so they should be cleaned from the dict when the referred value dies. So, I'm not exactly sure what's going on there.

IIUC, there maybe cases whenselfref orself_weakref already cleared byWeakValueDictionary or_WeakValueDictionary still have values should should be cleared to. But they will not be cleared because of conditionif selfref() is not None andif self_weakref() is not None.

IIUC, becausePEP-442 landed we can remove weakrefs here

cpython/Lib/weakref.py

Line 105 in0c3e3da

defremove(wr,selfref=ref(self),_atomic_removal=_remove_dead_weakref):

and there

cpython/Lib/importlib/_bootstrap.py

Line 65 in0c3e3da

self_weakref=_weakref.ref(self)

and use a strong reference that will outlive the KeyedRefs in the dictionaries. I replaced the weakref with a strong reference in the _bootstrap.py and checked with refleak buildbot (#136345). No leaks were found.

@nascheme WDYT?

sergey-miryanov mentioned this pull request

Jul 7, 2025

Replace weakref to the _WeakValueDictionary with strong ref to outlives keys#136345

Closed

Copy link

Contributor

neonene commentedJul 7, 2025

Weakrefwith a callback needs to be cleared early (as-is), accounting for the 3rd-party applications that have similar logic toWeakValueDictionary.

Copy link

Contributor

sergey-miryanov commentedJul 7, 2025

Weakrefwith a callback needs to be cleared early (as-is)

I'm not against this. But IIUC (feel free to correct me) we can face a situation in which weakref to theWeakValueDictionary is cleared in onegc collect call andKeyedRefs are cleared in another. And strong ref should handle this situation. We can teach users about this (I suppose).

Copy link

Contributor

neonene commentedJul 7, 2025

we can face a situation

The experiment#136345 seems currently invalid, which is based on main and not effective on this PR where I omitted calling_PyWeakref_ClearRef inhandle_weakref_callbacks.

Copy link

Contributor

sergey-miryanov commentedJul 7, 2025

The experiment#136345 seems currently invalid, which is based on main and not effective on this PR where I omitted calling_PyWeakref_ClearRef inhandle_weakref_callbacks.

If we use a strong reference, then it doesn't matter if we omit_PyWeakref_ClearRef or not. Am I wrong?

Copy link

Contributor

neonene commentedJul 7, 2025

Before discussing, have you checked on this PR? I can see the leaks.

Copy link

Contributor

sergey-miryanov commentedJul 7, 2025•
edited
Loading

Results

# clean test-weak-value-dict➜ .\python.bat -m test test_logging -R :Running Debug|x64 interpreter...Using random seed: 30907501910:00:00 Run 1 test sequentially in a single process0:00:00 [1/1] test_loggingbeginning 9 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)12345:6789XX... ....0:06:26 [1/1] test_logging passed in 6 min 26 sec== Tests result: SUCCESS ==1 test OK.Total duration: 6 min 26 secTotal tests: run=272 skipped=13Total test files: run=1/1Result: SUCCESS# clean gh-135552-wr-clear-later➜ .\python.bat -m test test_logging -R :             Running Debug|x64 interpreter...Using random seed: 25208066900:00:00 Run 1 test sequentially in a single process0:00:00 [1/1] test_loggingbeginning 9 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)12345:6789XX... ...2test_logging leaked [0, 0, 0, 2] memory blocks, sum=2 (this is fine)0:06:19 [1/1] test_logging passed in 6 min 19 sec== Tests result: SUCCESS ==1 test OK.Total duration: 6 min 19 secTotal tests: run=272 skipped=13Total test files: run=1/1Result: SUCCESS# gh-135552-wr-clear-later + weakref removed from WeakValueDictionary➜ .\python.bat -m test test_logging -R :Running Debug|x64 interpreter...Using random seed: 10967972890:00:00 Run 1 test sequentially in a single process0:00:00 [1/1] test_loggingbeginning 9 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)12345:6789XX.2. ....0:06:24 [1/1] test_logging passed in 6 min 24 sec== Tests result: SUCCESS ==1 test OK.Total duration: 6 min 24 secTotal tests: run=272 skipped=13Total test files: run=1/1Result: SUCCESS# gh-135552-wr-clear-later + weakref removed from WeakValueDictionary + call _PyWeakref_ClearRef only for wr with callbacks➜ .\python.bat -m test test_logging -R :Running Debug|x64 interpreter...Using random seed: 25993190690:00:00 Run 1 test sequentially in a single process0:00:00 [1/1] test_loggingbeginning 9 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)12345:6789XX.2. ....0:06:32 [1/1] test_logging passed in 6 min 32 sec== Tests result: SUCCESS ==1 test OK.Total duration: 6 min 32 secTotal tests: run=272 skipped=13Total test files: run=1/1Result: SUCCESS# gh-135552-wr-clear-later + call _PyWeakref_ClearRef only for wr with callbacks➜ .\python.bat -m test test_logging -R :Running Debug|x64 interpreter...Using random seed: 14934248240:00:00 Run 1 test sequentially in a single process0:00:00 [1/1] test_loggingbeginning 9 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)12345:6789XX2.. ....0:06:29 [1/1] test_logging passed in 6 min 29 sec== Tests result: SUCCESS ==1 test OK.Total duration: 6 min 29 secTotal tests: run=272 skipped=13Total test files: run=1/1Result: SUCCESS

Copy link

Contributor

neonene commentedJul 7, 2025

Let's move on to#136345. (The result above looks long here.)

nascheme mentioned this pull request

Jul 8, 2025

GH-91636: Clear weakrefs created by finalizers.#136401

Merged

nascheme added4 commits

July 8, 2025 12:33

Merge 'origin/main' intopythongh-135552-wr-clear-later

0060602

Revise NEWS.

900022b

Defer clear for weakrefs without callbacks.

84bd123

ThisfixespythonGH-132413 andpythonGH-135552.

Add unit test forpythonGH-132413.

bb29ea1

nascheme requested review fromkumaraditya303,pganssle andabalkin ascode owners

July 8, 2025 21:54

Copy link

MemberAuthor

nascheme commentedJul 8, 2025

GH-136401 was merged. I think it should be backported as well since it should be pretty low risk.

The main branch has been merged into this PR. I've updated the PR to defer clearing all weakrefs without callbacks (not just weakrefs to classes). This seems more correct to me (rather than special casing weakrefs to classes) and seems the best way to fixGH-132413.

I thinkGH-132413 is actually more likely to be encountered in real code. For example, if you have some logging function in a__del__ method that uses the datetime module. So, maybe this PR should be backported to 3.13 and 3.14 as well.

sergey-miryanov reviewed

Jul 9, 2025

View reviewed changes

Python/gc.cShow resolvedHide resolved

Copy link

Contributor

sergey-miryanov commentedJul 9, 2025

Looks good to me. Thanks for detailed comments!

neonene reviewed

Jul 9, 2025

View reviewed changes

Lib/test/test_finalization.pyShow resolvedHide resolved

Lib/test/test_gc.py OutdatedShow resolvedHide resolved

Lib/test/test_weakref.py OutdatedShow resolvedHide resolved

nascheme added2 commits

July 9, 2025 13:42

Add additional tests for weakref clearing.

073409b

Revert unneeded changes to unit tests.

135223e

Copy link

MemberAuthor

nascheme commentedJul 14, 2025

@pablogsal are you planning to take another look at this? If so, no problem, there is no great rush. If not, I think I'll do another review pass myself and then merge.

Copy link

Member

pablogsal commentedJul 14, 2025•
edited
Loading

Yeah i have it pending for this week (sorry fixing a lot of bugs for 3.14 currently) but if you feel I am taking too much time I am happy if you go ahead I don't want to block it on this

Copy link

Contributor

sergey-miryanov commentedJul 14, 2025

@nascheme I see you updated comment forhandle_weakrefs and removed mentions ofGC_REACHABLE andGC_TENTATIVELY_UNREACHABLE.

Would you mind also removing another one fromsubtract_refs comment?

cpython/Python/gc.c

Lines 572 to 576 in9363703

	/* Subtract internal references from gc_refs. After this, gc_refs is >= 0
	* for all objects in containers, and is GC_REACHABLE for all tracked gc
	* objects not in containers. The ones with gc_refs > 0 are directly
	* reachable from outside containers, and so can't be collected.
	*/

Copy link

Contributor

efimov-mikhail commentedJul 14, 2025

Do we want to merge this PR w/o tests on subclasses destruction from#135552?
Or maybe it will be better to place them here?

Copy link

MemberAuthor

nascheme commentedJul 14, 2025

Do we want to merge this PR w/o tests on subclasses destruction from#135552?
Or maybe it will be better to place them here?

Sergey has tests in another PR and I'll merge that right after I merge this one.

kumaraditya303 reviewed

Jul 15, 2025

View reviewed changes

Lib/test/test_finalization.py

		self.assertIs(wr(), None)
		# This used to be None because weakrefs were cleared before
		# calling finalizers. Now they are cleared after.
		self.assertIsNot(wr(), None)

Copy link

Contributor

kumaraditya303Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	self.assertIsNot(wr(),None)
	self.assertIsNotNone(wr())

Labels

awaiting core review

7 participants

Movatterモバイル変換

Uh oh!

gh-135552: Make the GC clear weakrefs later.#136189

Are you sure you want to change the base?

gh-135552: Make the GC clear weakrefs later.#136189

Conversation

nascheme commentedJul 1, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

neonene commentedJul 1, 2025

Uh oh!

nascheme commentedJul 1, 2025

Uh oh!

bedevere-bot commentedJul 1, 2025

Uh oh!

nascheme commentedJul 2, 2025

Uh oh!

nascheme commentedJul 2, 2025

Uh oh!

neonene commentedJul 2, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

nascheme commentedJul 2, 2025

Uh oh!

nascheme commentedJul 3, 2025

Uh oh!

bedevere-bot commentedJul 3, 2025

Uh oh!

nascheme commentedJul 3, 2025

Uh oh!

Uh oh!

Uh oh!

sergey-miryanovJul 3, 2025

Choose a reason for hiding this comment

Uh oh!

sergey-miryanov commentedJul 7, 2025

Uh oh!

neonene commentedJul 7, 2025

Uh oh!

sergey-miryanov commentedJul 7, 2025

Uh oh!

neonene commentedJul 7, 2025

Uh oh!

sergey-miryanov commentedJul 7, 2025

Uh oh!

neonene commentedJul 7, 2025

Uh oh!

sergey-miryanov commentedJul 7, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

neonene commentedJul 7, 2025

Uh oh!

nascheme commentedJul 8, 2025

Uh oh!

Uh oh!

sergey-miryanov commentedJul 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nascheme commentedJul 14, 2025

Uh oh!

pablogsal commentedJul 14, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

sergey-miryanov commentedJul 14, 2025

Uh oh!

efimov-mikhail commentedJul 14, 2025

Uh oh!

nascheme commentedJul 14, 2025

Uh oh!

kumaraditya303Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nascheme commentedJul 1, 2025•
edited
Loading

neonene commentedJul 2, 2025•
edited
Loading

sergey-miryanov commentedJul 7, 2025•
edited
Loading

pablogsal commentedJul 14, 2025•
edited
Loading