Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.3k
[3.13] gh-128679: Fix tracemalloc.stop() race conditions#128897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
tracemalloc_alloc(), tracemalloc_realloc(), PyTraceMalloc_Track(),PyTraceMalloc_Untrack() and _PyTraceMalloc_TraceRef() now checktracemalloc_config.tracing after calling TABLES_LOCK()._PyTraceMalloc_Stop() now protects more code with TABLES_LOCK(),especially setting tracemalloc_config.tracing to 1.Add a test using PyTraceMalloc_Track() to test tracemalloc.stop()race condition.
a92f15b to545053fComparevstinner commentedJan 16, 2025
@ZeroIntensity: Here is a simplified change for the 3.12 and 3.13 branch. Would you mind to review it? |
ZeroIntensity left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
There's a couple other (unmodified) functions where I'm seeing lockless reads:
tracemalloc_get_traceback_PyMem_DumpTraceback_PyTraceMalloc_GetTraces_PyTraceMalloc_GetTracedMemory_PyTraceMalloc_ResetPeak
Uh oh!
There was an error while loading.Please reload this page.
| intres=-1; | ||
| TABLES_LOCK(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We also readtracing without a lock before this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Checkingtracing without the lock is a workaround for the Python finalization. TABLES_LOCK() can no longer be called after _PyTraceMalloc_Fini() (the lock is destroyed), whereas _PyTraceMalloc_TraceRef() is still called after _PyTraceMalloc_Fini().
tracing is checked again once we acquired the lock.
The issue was fixed properly in the main branch by calling(void)PyRefTracer_SetTracer(NULL, NULL); in_PyTraceMalloc_Stop(). Since this change is designed to be backported to Python 3.12, I didn't include this fix.
Once this change is merged in 3.13 and 3.12, we can remove the unprotectedtracing read in 3.13 in_PyTraceMalloc_TraceRef() by adding(void)PyRefTracer_SetTracer(NULL, NULL);.
Uh oh!
There was an error while loading.Please reload this page.
vstinner commentedJan 16, 2025
Ok, I completed my PR to cover most (almost all?) functions, instead of focusing on alloc/realloc/free and Track/Untrack functions. @ZeroIntensity: Please review the updated PR. |
vstinner commentedJan 16, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Fixing properly In the main branch, I redesigned the code to be able to hold the lock for all tables operations. For that, I had to modify the behavior:
I'm open to consider fixing |
ZeroIntensity commentedJan 16, 2025
It looks like tests somehow were broken. |
So TABLES_LOCK() can be called in _PyMem_DumpTraceback() even iftracemalloc is not tracing.
vstinner commentedJan 16, 2025
Fixed. Well... Fixing more functions require to backport more changes. I backported the |
ZeroIntensity left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM. With the exception of a few special cases that you've mentioned, everything is properly locked! Due to the changes to the lifecycle and the size of the change, I think we should run buildbots before merging.
bedevere-bot commentedJan 16, 2025
🤖 New build scheduled with the buildbot fleet by@ZeroIntensity for commit54c8f9e 🤖 If you want to schedule another build, you need to add the🔨 test-with-buildbots label again. |
vstinner commentedJan 16, 2025
|
vstinner commentedJan 16, 2025
I mark this PR as a draft because of the FreeBSD crash. Oh, I can also reproduce the |
vstinner commentedJan 16, 2025
So far, I cannot reproduce the crash on Linux. |
vstinner commentedJan 16, 2025
I can trigger a crash if I insert a delay in diff --git a/Objects/obmalloc.c b/Objects/obmalloc.cindex b103deb01ca..c20eb47e8a8 100644--- a/Objects/obmalloc.c+++ b/Objects/obmalloc.c@@ -859,7 +859,14 @@ set_allocator_unlocked(PyMemAllocatorDomain domain, PyMemAllocatorEx *allocator) { switch(domain) {- case PYMEM_DOMAIN_RAW: _PyMem_Raw = *allocator; break;+ case PYMEM_DOMAIN_RAW:+ _PyMem_Raw.ctx = allocator->ctx;+ (void)sched_yield();+ _PyMem_Raw.malloc = allocator->malloc;+ _PyMem_Raw.calloc = allocator->calloc;+ _PyMem_Raw.realloc = allocator->realloc;+ _PyMem_Raw.free = allocator->free;+ break; case PYMEM_DOMAIN_MEM: _PyMem = *allocator; break; case PYMEM_DOMAIN_OBJ: _PyObject = *allocator; break; /* ignore unknown domain */ The write is protected by a lock (write), but |
vstinner commentedJan 18, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Only debug build are affected, not release build. I modified the test in the main branch to skip it on debug build:#128988 |
…ython#128988)There is a race condition between PyMem_SetAllocator() andPyMem_RawMalloc()/PyMem_RawFree(). While PyMem_SetAllocator() writeis protected by a lock, PyMem_RawMalloc()/PyMem_RawFree() reads arenot protected by a lock. PyMem_RawMalloc()/PyMem_RawFree() can becalled with an old context and the new function pointer.On a release build, it's not an issue since the context is not used.On a debug build, the debug hooks use the context and so can crash.(cherry picked from commit9bc1964)
6b47499 intopython:3.13Uh oh!
There was an error while loading.Please reload this page.
Thanks@vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12. |
Sorry,@vstinner, I could not cleanly backport this to |
GH-129022 is a backport of this pull request to the3.12 branch. |
…129022)[3.13]gh-128679: Fix tracemalloc.stop() race conditions (#128897)tracemalloc_alloc(), tracemalloc_realloc(), PyTraceMalloc_Track(),PyTraceMalloc_Untrack() and _PyTraceMalloc_TraceRef() now checktracemalloc_config.tracing after calling TABLES_LOCK()._PyTraceMalloc_Stop() now protects more code with TABLES_LOCK(),especially setting tracemalloc_config.tracing to 1.Add a test using PyTraceMalloc_Track() to test tracemalloc.stop()race condition.Call _PyTraceMalloc_Init() at Python startup.(cherry picked from commit6b47499)
Uh oh!
There was an error while loading.Please reload this page.
tracemalloc_alloc(), tracemalloc_realloc(), PyTraceMalloc_Track(), PyTraceMalloc_Untrack() and _PyTraceMalloc_TraceRef() now check tracemalloc_config.tracing after calling TABLES_LOCK().
_PyTraceMalloc_Stop() now protects more code with TABLES_LOCK(), especially setting tracemalloc_config.tracing to 1.
Add a test using PyTraceMalloc_Track() to test tracemalloc.stop() race condition.