Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
gh-114271: Fix race inThread.join()#114839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
There is a race between when `Thread._tstate_lock` is released[^1] in `Thread._wait_for_tstate_lock()`and when `Thread._stop()` asserts[^2] that it is unlocked. Consider the following executioninvolving threads A, B, and C:1. A starts.2. B joins A, blocking on its `_tstate_lock`.3. C joins A, blocking on its `_tstate_lock`.4. A finishes and releases its `_tstate_lock`.5. B acquires A's `_tstate_lock` in `_wait_for_tstate_lock()`, releases it, but is swapped out before calling `_stop()`.6. C is scheduled, acquires A's `_tstate_lock` in `_wait_for_tstate_lock()` but is swapped out before releasing it.7. B is scheduled, calls `_stop()`, which asserts that A's `_tstate_lock` is not held. However, C holds it, so the assertion fails.The race can be reproduced[^3] by inserting sleeps at the appropriate points inthe threading code. To do so, run the `repro_join_race.py` from the linked repo.There are two main parts to this PR:1. `_tstate_lock` is replaced with an event that is attached to `PyThreadState`. The event is set by the runtime prior to the thread being cleared (in the same place that `_tstate_lock` was released). `Thread.join()` blocks waiting for the event to be set.2. `_PyInterpreterState_WaitForThreads()` provides the ability to wait for all non-daemon threads to exit. To do so, an `is_daemon` predicate was added to `PyThreadState`. This field is set each time a thread is created. `threading._shutdown()` now calls into `_PyInterpreterState_WaitForThreads()` instead of waiting on `_tstate_lock`s.[^1]:https://github.com/python/cpython/blob/441affc9e7f419ef0b68f734505fa2f79fe653c7/Lib/threading.py#L1201[^2]:https://github.com/python/cpython/blob/441affc9e7f419ef0b68f734505fa2f79fe653c7/Lib/threading.py#L1115[^3]:8194653
mpage commentedFeb 1, 2024
Just a heads up - please don't look at this until it's marked as ready for review. There are a couple of failing CI jobs that need to be addressed. |
When we have `typedef struct _PyEventRc _PyEventRc` in `pstate.h`the C analyzer can't decide between that and the definition in `pycore_lock.h`.
mpage commentedFeb 6, 2024
@colesbury - This is good to go. Would you please have a look? |
mpage commentedFeb 6, 2024
Putting this back in draft. Going to see if I can move the join logic into |
mpage commentedMar 6, 2024
@colesbury - Could you take a look at this when you get a chance, please? |
colesbury left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Overall, the approach looks nice!
I'm still looking this over, but I have some questions aboutthread._handle. Currently,thread._handle starts out asNone and is then set to aThreadHandle after the thread is started. ThisNone ->ThreadHandle transition seems like it adds complexity and I think there may be a few bugs related to it (see comments below).
Would it be possible to always have a non-Noneself._handle by pushing a bit more state intoThreadHandle?
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
ericsnowcurrently commentedMar 14, 2024
I'd be glad to follow-through on the changes I posted ingh-116776, if that would help you. |
mpage commentedMar 14, 2024
@ericsnowcurrently - Thanks! Is that something you think you can get done quickly (i.e. in the next few days)? If so, I'd be happy to stack on top of those changes once you're done. |
ericsnowcurrently commentedMar 14, 2024
Yeah, I'll aim to wrap it up tomorrow. |
pitrou left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
+1. I really like the simplification here!
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
ericsnowcurrently commentedMar 15, 2024
I'll defer to@pitrou's assessment on this. I'd rather this land sooner than later and don't want my pace to hold things up. Again, thanks for working on this. (I'll follow up separately on any cleanup, about the "main" thread, and about our approach to managing thread lifetimes. I actually appreciate not feeling so rushed to sort those out. 😄) |
ericsnowcurrently commentedMar 15, 2024
FWIW, this change feels a bit closer to what I had in mind last year:#110848 (comment). |
mpage commentedMar 15, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
The |
pitrou commentedMar 15, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
pitrou commentedMar 15, 2024
@gpshead Would you like to take a look at this? |
pitrou commentedMar 16, 2024
Ok, TSAN tests passed after merging from main (though there are too few of them currently). |
pitrou commentedMar 16, 2024
Updated again to get test_threading in the set of TSAN tests. |
There is a race between when `Thread._tstate_lock` is released[^1] in `Thread._wait_for_tstate_lock()`and when `Thread._stop()` asserts[^2] that it is unlocked. Consider the following executioninvolving threads A, B, and C:1. A starts.2. B joins A, blocking on its `_tstate_lock`.3. C joins A, blocking on its `_tstate_lock`.4. A finishes and releases its `_tstate_lock`.5. B acquires A's `_tstate_lock` in `_wait_for_tstate_lock()`, releases it, but is swapped out before calling `_stop()`.6. C is scheduled, acquires A's `_tstate_lock` in `_wait_for_tstate_lock()` but is swapped out before releasing it.7. B is scheduled, calls `_stop()`, which asserts that A's `_tstate_lock` is not held. However, C holds it, so the assertion fails.The race can be reproduced[^3] by inserting sleeps at the appropriate points inthe threading code. To do so, run the `repro_join_race.py` from the linked repo.There are two main parts to this PR:1. `_tstate_lock` is replaced with an event that is attached to `PyThreadState`. The event is set by the runtime prior to the thread being cleared (in the same place that `_tstate_lock` was released). `Thread.join()` blocks waiting for the event to be set.2. `_PyInterpreterState_WaitForThreads()` provides the ability to wait for all non-daemon threads to exit. To do so, an `is_daemon` predicate was added to `PyThreadState`. This field is set each time a thread is created. `threading._shutdown()` now calls into `_PyInterpreterState_WaitForThreads()` instead of waiting on `_tstate_lock`s.[^1]:https://github.com/python/cpython/blob/441affc9e7f419ef0b68f734505fa2f79fe653c7/Lib/threading.py#L1201[^2]:https://github.com/python/cpython/blob/441affc9e7f419ef0b68f734505fa2f79fe653c7/Lib/threading.py#L1115[^3]:mpage@8194653---------Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>Co-authored-by: Antoine Pitrou <antoine@python.org>
There is a race between when `Thread._tstate_lock` is released[^1] in `Thread._wait_for_tstate_lock()`and when `Thread._stop()` asserts[^2] that it is unlocked. Consider the following executioninvolving threads A, B, and C:1. A starts.2. B joins A, blocking on its `_tstate_lock`.3. C joins A, blocking on its `_tstate_lock`.4. A finishes and releases its `_tstate_lock`.5. B acquires A's `_tstate_lock` in `_wait_for_tstate_lock()`, releases it, but is swapped out before calling `_stop()`.6. C is scheduled, acquires A's `_tstate_lock` in `_wait_for_tstate_lock()` but is swapped out before releasing it.7. B is scheduled, calls `_stop()`, which asserts that A's `_tstate_lock` is not held. However, C holds it, so the assertion fails.The race can be reproduced[^3] by inserting sleeps at the appropriate points inthe threading code. To do so, run the `repro_join_race.py` from the linked repo.There are two main parts to this PR:1. `_tstate_lock` is replaced with an event that is attached to `PyThreadState`. The event is set by the runtime prior to the thread being cleared (in the same place that `_tstate_lock` was released). `Thread.join()` blocks waiting for the event to be set.2. `_PyInterpreterState_WaitForThreads()` provides the ability to wait for all non-daemon threads to exit. To do so, an `is_daemon` predicate was added to `PyThreadState`. This field is set each time a thread is created. `threading._shutdown()` now calls into `_PyInterpreterState_WaitForThreads()` instead of waiting on `_tstate_lock`s.[^1]:https://github.com/python/cpython/blob/441affc9e7f419ef0b68f734505fa2f79fe653c7/Lib/threading.py#L1201[^2]:https://github.com/python/cpython/blob/441affc9e7f419ef0b68f734505fa2f79fe653c7/Lib/threading.py#L1115[^3]:mpage@8194653---------Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>Co-authored-by: Antoine Pitrou <antoine@python.org>
There is a race between when `Thread._tstate_lock` is released[^1] in `Thread._wait_for_tstate_lock()`and when `Thread._stop()` asserts[^2] that it is unlocked. Consider the following executioninvolving threads A, B, and C:1. A starts.2. B joins A, blocking on its `_tstate_lock`.3. C joins A, blocking on its `_tstate_lock`.4. A finishes and releases its `_tstate_lock`.5. B acquires A's `_tstate_lock` in `_wait_for_tstate_lock()`, releases it, but is swapped out before calling `_stop()`.6. C is scheduled, acquires A's `_tstate_lock` in `_wait_for_tstate_lock()` but is swapped out before releasing it.7. B is scheduled, calls `_stop()`, which asserts that A's `_tstate_lock` is not held. However, C holds it, so the assertion fails.The race can be reproduced[^3] by inserting sleeps at the appropriate points inthe threading code. To do so, run the `repro_join_race.py` from the linked repo.There are two main parts to this PR:1. `_tstate_lock` is replaced with an event that is attached to `PyThreadState`. The event is set by the runtime prior to the thread being cleared (in the same place that `_tstate_lock` was released). `Thread.join()` blocks waiting for the event to be set.2. `_PyInterpreterState_WaitForThreads()` provides the ability to wait for all non-daemon threads to exit. To do so, an `is_daemon` predicate was added to `PyThreadState`. This field is set each time a thread is created. `threading._shutdown()` now calls into `_PyInterpreterState_WaitForThreads()` instead of waiting on `_tstate_lock`s.[^1]:https://github.com/python/cpython/blob/441affc9e7f419ef0b68f734505fa2f79fe653c7/Lib/threading.py#L1201[^2]:https://github.com/python/cpython/blob/441affc9e7f419ef0b68f734505fa2f79fe653c7/Lib/threading.py#L1115[^3]:mpage@8194653---------Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>Co-authored-by: Antoine Pitrou <antoine@python.org>
…hon 3.13`Thread._is_stopped` was removed inpython/cpython#114839 in Python 3.13(cherry picked from commit d568bc288aed00268ffeef137b9b901f480964ef)IJ-MR-137093GitOrigin-RevId: aa8b27b09db85f69b44d6fbbd333f8ec32489d33
Uh oh!
There was an error while loading.Please reload this page.
This PR makes thread joining thread-safe in free-threaded builds andfixesgh-116372, which affects both default and free-threaded builds.
The combination of
_tstate_lockand_ThreadHandleis replaced by_ThreadHandleand the supporting infrastructure for_tstate_lockis removed. Joining a thread now joins its_ThreadHandle. The_ThreadHandles for all threads that we want to join on shutdown (non-daemon threads that were created by the threading module) are tracked in a linked-list attached to the_threadmodule. At shutdown we iterate over this list and join the handles.There is one wrinkle with this approach: the
_MainThreadinstance. The_MainThreadinstance is joinable. Previously, it represented whatever thread imported thethreadingmodule. Typically this is the main thread of the main interpreter, but in some rare circumstances (embedding scenarios?) it might not be. The previous approach allowed theMainThreadinstance to be joined even if it wasn't the main thread of the main interpreter, but this isn't possible with the current approach. To work around this we change the_MainThreadto always represent the main thread of the runtime, also fixinggh-83223._threadmodule.cthread-safe in--disable-gilbuilds #114271