Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
gh-132942: Fix races in type lookup cache#133032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Two races related to the type lookup cache, when used in thefree-threaded build. This caused test_opcache to sometimes fail (aswell as other hard to re-produce failures).
Uh oh!
There was an error while loading.Please reload this page.
Here is a script that triggers the crash. It can take a while, especially if running under "rr". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM
31d1342
intopython:mainUh oh!
There was an error while loading.Please reload this page.
Thanks@nascheme for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13. |
Sorry,@nascheme, I could not cleanly backport this to
|
Uh oh!
There was an error while loading.Please reload this page.
Two races related to the type lookup cache, when used in the free-threaded build. This caused test_opcache to sometimes fail (as well as other hard to reproduce failures).
The first problem is that
find_name_in_mro()
can block on some mutex and then release critical sections. If that happens, the type version used for the cache entry can be wrong (too new). Assigning the version before doing the find fixes this issue. If it does race, you will add an entry that uses an out-of-date version.The second problem was much harder to track down. There is a hard to trigger race in
update_cache()
, writing to cache, and_PyType_LookupStackRefAndVersion()
, reading from cache. We use a sequence lock to avoid races. However, if the reader reads the old entry value and the new entry version, it will try to execute_Py_TryXGetStackRef()
on a stale cache entry value. If that value has been deallocated,PyStackRef_XCLOSE()
will crash. This could happen before because the version was written first and then new value second.The fix is simply to write the entry value first and the version after. That way, the reader always sees a value at least as new as the version.
Possible scenarios for the reader of the cache entry, as it is being written to concurrently: