NotificationsYou must be signed in to change notification settings
Fork32.3k
Star67.8k

gh-132519: fix excessive mem usage in QSBR with large blocks#132520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Closed

tom-pytel wants to merge2 commits intopython:mainfromtom-pytel:fix-issue-132519

Closed

gh-132519: fix excessive mem usage in QSBR with large blocks#132520

tom-pytel wants to merge2 commits intopython:mainfromtom-pytel:fix-issue-132519

Conversation

Copy link

Contributor

tom-pytel commentedApr 14, 2025•
edited
Loading

Memory usage numbers (proposed fix explained below):

             VmHWMGIL      135104 kB  - normal GIL-enabled baselinenoGIL   6702788 kB  - free-threaded current QSBR behaviorfix      517760 kB  - free-threaded with _PyMem_ProcessDelayed() in _Py_HandlePending()

Test script:

importthreadingfromqueueimportQueuedefthrdfunc(queue):whileTrue:l=queue.get()l.append(0)# force resize in non-parent thread which will free using _PyMem_FreeDelayed()queue=Queue(maxsize=2)threading.Thread(target=thrdfunc,args=(queue,)).start()whileTrue:l= [None]*int(3840*2160*3/8)# sys.getsizeof(l) ~= 3840*2160*3 bytesqueue.put(l)

Delayed memory free checks (and subsequent frees if applicable) currently only occur in one of two situations:

Garbage collection, which doesn't trigger often enough in this script, though manual trigger solves problem.
On a_PyMem_FreeDelayed() when the number of pending delayed free memory blocks reaches exactly 254. And then it waits another 254 frees even if could not free any pending blocks this time, which is a lot for big buffers.

This works great for many small objects, but with larger buffers these can accumulate quickly, so more frequent checks should be done.

I tried a few things but_PyMem_ProcessDelayed() added to_Py_HandlePending() seems to work well and be safe and aQSBR_QUIESCENT_STATE has just been reported so there is a fresh chance to actually free. Seems to happen often enough that memory usage is kept down, and if nothing to free then_PyMem_ProcessDelayed() is super-cheap.

Another option would be to track the amount of pending memory to be freed and increase the frequency of free attempts if that number gets too large, but to start with this small change seems to solved the problem well enough. Could also schedule GC if pending frees get too high, but that seems like a roundabout way to arrive at_PyMem_ProcessDelayedNoDealloc().

Performance as checked bypyperformance full suite is unchanged with the fix (literally 0.17% better avg, so noise).

Issue:Excessive QSBR memory usage when delay freeing large blocks. #132519

pythongh-132519: fix excessive mem usage in QSBR with large blocks

d29dc7c

tom-pytel requested a review fromericsnowcurrently as acode owner

April 14, 2025 15:20

bedevere-appbot added the awaiting review label

Apr 14, 2025

bedevere-appbot mentioned this pull request

Apr 14, 2025

Excessive QSBR memory usage when delay freeing large blocks.#132519

Closed

📜🤖 Added by blurb_it.

4037c10

Copy link

ContributorAuthor

tom-pytel commentedApr 14, 2025

Ping@colesbury,@kumaraditya303. Is there a better place for the_PyMem_ProcessDelayed()? I thought_PyThreadState_Attach() at first but that is too low level.

colesbury reviewed

Apr 14, 2025

View reviewed changes

Copy link

Contributor

colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think we should do this. You risk accidentally introducing quadratic behavior.

We will likely tweak the heuristics in the future for when_PyMem_ProcessDelayed() is called, but that should be based on data for real applications.