Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
gh-132519: fix excessive mem usage in QSBR with large blocks#132520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation
Ping@colesbury,@kumaraditya303. Is there a better place for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't think we should do this. You risk accidentally introducing quadratic behavior.
We will likely tweak the heuristics in the future for when_PyMem_ProcessDelayed()
is called, but that should be based on data for real applications.
Uh oh!
There was an error while loading.Please reload this page.
Memory usage numbers (proposed fix explained below):
Test script:
Delayed memory free checks (and subsequent frees if applicable) currently only occur in one of two situations:
_PyMem_FreeDelayed()
when the number of pending delayed free memory blocks reaches exactly 254. And then it waits another 254 frees even if could not free any pending blocks this time, which is a lot for big buffers.This works great for many small objects, but with larger buffers these can accumulate quickly, so more frequent checks should be done.
I tried a few things but
_PyMem_ProcessDelayed()
added to_Py_HandlePending()
seems to work well and be safe and aQSBR_QUIESCENT_STATE
has just been reported so there is a fresh chance to actually free. Seems to happen often enough that memory usage is kept down, and if nothing to free then_PyMem_ProcessDelayed()
is super-cheap.Another option would be to track the amount of pending memory to be freed and increase the frequency of free attempts if that number gets too large, but to start with this small change seems to solved the problem well enough. Could also schedule GC if pending frees get too high, but that seems like a roundabout way to arrive at
_PyMem_ProcessDelayedNoDealloc()
.Performance as checked by
pyperformance
full suite is unchanged with the fix (literally 0.17% better avg, so noise).