Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-127266: avoid data races when updating type slots v2#133177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
nascheme wants to merge18 commits intopython:main
base:main
Choose a base branch
Loading
fromnascheme:type-slot-ts-v2

Conversation

nascheme
Copy link
Member

@naschemenascheme commentedApr 29, 2025
edited
Loading

This is an updated version ofGH-131174, which was reverted. I figured the cleanest thing to do is make a new PR.

This is the same as the previous PR with the following additional change. Theupdate_all_slots() andtype_setattro() functions are now more careful when the world is stopped. Instead of doing the MRO lookups while the world is stopped, we do them all first and collect the slot pointers to be updated. Then, we stop the world and do those updates. This makes it much easier to confirm the code running during the stop-the-world is safe and that should avoid the deadlocks.

Thetest_opcache test has become quite a bit slower. It seems to be due to mutex contention in the__getitem__ and__getattribute__ method assignment tests. I reduced the items count from 1000 to 100 to keep the test from becoming much slower.

In the free-threaded build, avoid data races caused by updating typeslots or type flags after the type was initially created.  For those(typically rare) cases, use the stop-the-world mechanism.  Remove theuse of atomics when reading or writing type flags.  The use of atomicsis not sufficient to avoid races (since flags are sometimes read withouta lock and without atomics) and are no longer required.
To avoid deadlocks while the world is stopped, we need to avoid calling APIslike _PyObject_HashFast() and _PyDict_GetItemRef_KnownHash().  Collect theslot updates to be done and then apply them all at once.  This reduces theamount of code running in the stop-the-world condition.
@naschemenascheme added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelApr 30, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@nascheme for commitd511ca6 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F133177%2Fmerge

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelApr 30, 2025
Now that stop-the-world is used to do the slot update, these testsare a lot slower in the free-threaded build.  Test with fewer items,which will still hopefully be enough to find bugs in the specializer.
The clearing of Py_TPFLAGS_HAVE_VECTORCALL must be done when the worldis stopped too.
@naschemenascheme added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelApr 30, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@nascheme for commit3cb2256 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F133177%2Fmerge

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelApr 30, 2025
@naschemenascheme marked this pull request as ready for reviewApril 30, 2025 13:29
Since we stack allocate one chunk, we need to check 'n' to see if thereare actually any updates to make.  It's pretty common that no updatesare actually needed.
@@ -576,6 +576,7 @@ class TestRacesDoNotCrash(TestBase):
# Careful with these. Bigger numbers have a higher chance of catching bugs,
# but you can also burn through a *ton* of type/dict/function versions:
ITEMS = 1000
SMALL_ITEMS = 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It might be worth investigating this further. I'm surprised there's a large slowdown in this PR, but not in the earlier version. What's the relevant difference?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I timed the most recent version of this PR vs "main", usingSMALL_ITEMS = 1000. It seems the slowdown is not too bad now. I've left it at 100 but we could revert to 1000 if we want to.

I wasn't able to exactly pin-point the reason for the huge slowdown before. Based on "samply" profiling, it looked like there was a lot of contention for theTYPE_LOCK mutex and the STM mutex. Quite a lot of time was spent in_Py_yield(). That's kind of expected since the two writers are both trying to acquire both of those.

@nascheme
Copy link
MemberAuthor

@colesbury Not sure if you would like this approach but it is a variation on your idea to modify the critical section to prevent release of the mutex. I changed it to work with a PyCriticalSection2 as well.

https://github.com/nascheme/cpython/tree/type-slot-ts-release-hack

Not using a critical section for the type dict looks kind of difficult to do. For example,_Py_dict_lookup asserts that the dict is locked. I think we would have to duplicate some dictobject functions to make non-lock-asserting versions.

@colesbury
Copy link
Contributor

Thetype-slot-ts-release-hack approach makes sense to me

@nascheme
Copy link
MemberAuthor

SeeGH-133467 for some remaining data race issues.

If the two mutex form of the critical section is used, need to put theother mutex into '_cs_mutex'.
@naschemenascheme added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelMay 5, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@nascheme for commit3f6222b 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F133177%2Fmerge

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelMay 5, 2025
@@ -1075,7 +1177,7 @@ type_modified_unlocked(PyTypeObject *type)
We don't assign new version tags eagerly, but only as
needed.
*/
ASSERT_TYPE_LOCK_HELD();
ASSERT_NEW_TYPE_OR_LOCKED(type);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
ASSERT_NEW_TYPE_OR_LOCKED(type);
ASSERT_NEW_TYPE_OR_LOCKED(type);
assert(!types_world_is_stopped());

Since this is re-entrant, the world shouldn't be stopped.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@kumaraditya303kumaraditya303kumaraditya303 left review comments

@colesburycolesburycolesbury approved these changes

@markshannonmarkshannonAwaiting requested review from markshannonmarkshannon is a code owner

Assignees
No one assigned
Labels
awaiting mergetopic-free-threadingtype-bugAn unexpected behavior, bug, or error
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

4 participants
@nascheme@bedevere-bot@colesbury@kumaraditya303

[8]ページ先頭

©2009-2025 Movatter.jp