Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork34.3k
gh-135953: Properly obtain main thread identifier in Gecko Collector#146045
gh-135953: Properly obtain main thread identifier in Gecko Collector#146045pablogsal merged 7 commits intopython:mainfrom
Conversation
Since running a profiler via CLI (python -m profiling.sampling run)spawns a new subprocess where the actual user-specified code will run,a call to threading.main_thread() in the collector's process will notreturn the profiled process's main thread.To combat this, we rely on the fact that thread objects are insertedin such a way that the first object in the list represents the oldestThreadState object [1], which corresponds to a ThreadState associatedwith the main thread.[1] -https://github.com/python/cpython/blob/1b118353bb0a9d816de6ef673f3b11775de5bec5/Include/internal/pycore_interp_structs.h#L831Signed-off-by: Sofia Donato Ferreira <flowlnlnln@gmail.com>
Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
python-cla-botbot commentedMar 17, 2026 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
The ordering is actually newest -> oldest, since the _remote_debuggingcode traverses the ThreadState linked list in the intepreter state, appendingto a list of threads in-order.Signed-off-by: Sofia Donato Ferreira <flowlnlnln@gmail.com>
Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
It seems to sometimes happen that some samples (possibly at the very start / end)do not have any active threads in the interpreter state.It is OK to make main_tid maybe None here, since if it is, the threads list isempty, and so the for loop will not execute anyways.Signed-off-by: Sofia Donato Ferreira <flowlnlnln@gmail.com>
Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
pablogsal commentedMar 20, 2026
Thanks for the PR! I think we should do this but I find the approach a bit flaky. We have enough information in the C layer to know what thread is the main thread so we can surface that in the data that we emit (or alternatively we can add to the unwinder a new method that tells us the tid of the main thread). Do you feel comfortable to do this in the C code? |
…tateSigned-off-by: Sofia Donato Ferreira <flowlnlnln@gmail.com>
Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
flowln commentedMar 21, 2026
@pablogsal I've made the minimum set of changes in the |
… in Gecko testSigned-off-by: Sofia Donato Ferreira <flowlnlnln@gmail.com>
Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
pablogsal left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for working on this. Moving the main-thread identification into C is the right direction, but I don't think this version is ready to merge yet.
There are two correctness issues with the current shape:
main_thread_idcan remain unset in the live unwinder path. Inmodule.cit is only populated if the loop actually visits the interpreter's mainPyThreadState, but in filtered modes (CPU,GIL,EXCEPTION)unwind_stack_for_thread()can skip that thread entirely, and theonly_active_thread/ targeted-thread paths can avoid it as well.The new
InterpreterInfo.main_thread_idfield is not wired through binary replay.InterpreterInfonow has 3 fields, but the binary writer does not serializemain_thread_id, and the binary reader only fills items 0 and 2. That meansreplay ... --format geckowill still lose the main-thread marker.
I think the cleanest fix is to model this per thread rather than per interpreter: either add anis_main_thread field onThreadInfo or, better, a newTHREAD_STATUS_* bit. That avoids depending on the main thread being present in a given sample, and it should flow through the existing binary status byte naturally.
If you prefer to keep the currentmain_thread_id approach, then I think we at least need to: usePy_None when the value is unknown, extend the binary format reader/writer for the new field, and add tests for binary-to-Gecko replay plus a filtered-mode case.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
pablogsal commentedMar 22, 2026
I can give this a go, don't worry |
…terInfo fieldReplace the main_thread_id field in InterpreterInfo with aTHREAD_STATUS_MAIN_THREAD status flag set directly in the threadunwinding code. This simplifies the struct from 3 to 2 fields andremoves the need to compare thread IDs at the Python level.
Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
pablogsal left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM
I made the marking via a new status bit, which nicely goes through the binary format :)
Thanks a lot for the PR
a17301a intopython:mainUh oh!
There was an error while loading.Please reload this page.
…8577* 'main' of github.com:python/cpython:pythongh-146197: Run -m test.pythoninfo on the Emscripten CI (python#146332)pythongh-146325: Use `test.support.requires_fork` in test_fastpath_cache_cleared_in_forked_child (python#146330)pythongh-146197: Add Emscripten to CI (python#146198)pythongh-143387: Raise an exception instead of returning None when metadata file is missing. (python#146234)pythongh-108907: ctypes: Document _type_ codes (pythonGH-145837)pythongh-146175: Soft-deprecate outdated macros; convert internal usage (pythonGH-146178)pythongh-146056: Rework ref counting in treebuilder_handle_end() (python#146167) Add a warning about untrusted input to `configparser` docs (python#146276)pythongh-145264: Do not ignore excess Base64 data after the first padded quad (pythonGH-145267)pythongh-146308: Fix error handling issues in _remote_debugging module (python#146309)pythongh-146192: Add base32 support to binascii (pythonGH-146193)pythongh-135953: Properly obtain main thread identifier in Gecko Collector (python#146045)pythongh-143414: Implement unique reference tracking for JIT, optimize unpacking of such tuples (pythonGH-144300)pythongh-146261: Fix bug in `_Py_uop_sym_set_func_version` (pythonGH-146291)pythongh-145144: Add more tests for UserList, UserDict, etc (pythonGH-145145)pythongh-143959: Fix test_datetime if _datetime is unavailable (pythonGH-145248)pythongh-146245: Fix reference and buffer leaks via audit hook in socket module (pythonGH-146248)pythongh-140049: Colorize exception notes in `traceback.py` (python#140051) Update docs forpythongh-146056 (pythonGH-146213)
Uh oh!
There was an error while loading.Please reload this page.
Since running a profiler via CLI (
python -m profiling.sampling run) spawns a new subprocess where the actual user-specified code will run, a call tothreading.main_thread()in the collector's process will not return the profiled process's main thread.To combat this, we add a new attribute to the returned
InterpreterStatefor the sampled stack frames, which takes the id of thethread_mainmember from the original sampled interpreter's state.This allows the main thread to be highlighted on Firefox Profiler:
