Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.3k
gh-112529: Make the GC scheduling thread-safe#114880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
The GC keeps track of the number of allocations (less deallocations)since the last GC. This buffers the count in thread-local state and usesatomic operations to modify the per-interpreter count. The thread-localbuffering avoids contention on shared state.A consequence is that the GC scheduling is not as precise, so"test_sneaky_frame_object" is skipped because it requires that the GC berun exactly after allocating a frame object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The GC is one of the runtime components with which I am least familiar, so I mostly have questions for you. 😄
Otherwise, the PR mostly makes sense.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
The change here seems okay to me, but I'd feel better if one of the GC experts reviewed this before it's merged. |
I've not looked at the code but the idea of the change sounds fine to me. I suspect there are some users who require that the GC threshold is precise, like the |
| // We buffer the allocation count to avoid the overhead of atomic | ||
| // operations for every allocation. | ||
| gc->alloc_count++; | ||
| if (gc->alloc_count >=LOCAL_ALLOC_COUNT_THRESHOLD) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I wonder if this could be tied to the configurable GC threshold and therefore the tests could continue to pass but maybe it doesn't matter enough and the extra read isn't worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yeah, I considered making it a configurable runtime threshold, but decided it wasn't worth it, at least for now.
I think there's a decent chance we change how we count allocations in the future. In thenogil forks, for example, I accounted for allocations inmi_page_to_full and_mi_page_unfull, which provides some natural batching and avoids the thread-local that's done in every allocation here, but wouldn't allow for a configurable threshold. I haven't attempted that yet because I'd like some performance measurements to justify it first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM!
bedevere-bot commentedFeb 14, 2024
🤖 New build scheduled with the buildbot fleet by@colesbury for commitf99d14e 🤖 If you want to schedule another build, you need to add the🔨 test-with-buildbots label again. |
The GC keeps track of the number of allocations (less deallocations)since the last GC. This buffers the count in thread-local state and usesatomic operations to modify the per-interpreter count. The thread-localbuffering avoids contention on shared state.A consequence is that the GC scheduling is not as precise, so"test_sneaky_frame_object" is skipped because it requires that the GC berun exactly after allocating a frame object.
The GC keeps track of the number of allocations (less deallocations)since the last GC. This buffers the count in thread-local state and usesatomic operations to modify the per-interpreter count. The thread-localbuffering avoids contention on shared state.A consequence is that the GC scheduling is not as precise, so"test_sneaky_frame_object" is skipped because it requires that the GC berun exactly after allocating a frame object.
The GC keeps track of the number of allocations (less deallocations)since the last GC. This buffers the count in thread-local state and usesatomic operations to modify the per-interpreter count. The thread-localbuffering avoids contention on shared state.A consequence is that the GC scheduling is not as precise, so"test_sneaky_frame_object" is skipped because it requires that the GC berun exactly after allocating a frame object.
Uh oh!
There was an error while loading.Please reload this page.
The GC keeps track of the number of allocations (less deallocations) since the last GC. This change buffers the allocation count in thread-local state and uses atomic operations to modify the per-interpreter count. The thread-local buffering avoids contention on shared state.
A consequence is that the GC scheduling is not as precise, so "test_sneaky_frame_object" is skipped because it requires that the GC be run exactly after allocating a frame object.
--disable-gilbuilds #112529