Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-127266: avoid data races when updating type slots#131174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
nascheme merged 54 commits intopython:mainfromnascheme:gh-127266-type-slots-ts
Apr 28, 2025

Conversation

nascheme
Copy link
Member

@naschemenascheme commentedMar 13, 2025
edited
Loading

Avoid data races when updating type slots and type flags by "stopping-the-world" first. This is only needed if the type is potentially exposed to multiple threads. There are a couple places this happens:

  • type_setattro() when the assigned item is a "dunder" method that corresponds to one or more slot
  • assigning to__bases__
  • _abc.c calling_PyType_SetFlags() to set the collection flags (SEQUENCE or MAPPING)

Two examples of code that causes this:dataclasses will assign dunder methods after the type object has been created, when the decorator runs. That triggers this and since the slots are assigned one-by-one, we stop for each assignment. Not great. Another example is unittestMagicMock. Again, it assigns dunder methods after the class is defined.

In order to assess the performance impact of this change, I've measured the following things. The time it takes to run the unit test suite does not seem significantly worse, even when running tests serially using-j 1. Comparing the pyperformance results before and after this change shows no significant change. See link below for the chart.

Running with TSAN enabled and running the--tsan and--tsan-parallel --parallel-threads=4 tests show no warnings. I also have a modified pyperformance test suite that I can run in parallel and that shows no TSAN warnings related to this as well.

Additional changes after feedback from Sam:

  • Replace the critical sections that useTYPE_LOCK with a simple mutex instead. To avoid dead-locks, the code was analyzed for re-entrancy and adjusted as needed. There are a couple places where the mutex is dropped before calling something that's possibly re-entrant.
  • Remove some locking where it is not needed. Slots can only be changed when the world is stopped (e.g. readingtp_mro no longer needs to hold the lock).
  • Createtype_lookup_ex() as a lower-level version of _PyType_LookupStackRefAndVersion that optionally acquires theTYPE_LOCK or requires that it is already held.
  • Add a new type flag that is used for debug enabled builds. It tries to check if a type is potentially exposed to multiple threads (and therefore needs stop-the-world before modifying slots). This flag could perhaps be removed before the final merge of this PR (since type flag bits are kind of in short supply).

Benchmark using pyperformance

@kumaraditya303
Copy link
Contributor

kumaraditya303 commentedMar 14, 2025
edited
Loading

Two examples that cause this: dataclasses will assign dunder methods after the type object has been created. That triggers this and since the slots are assigned one-by-one, we stop for each assignment. Not great. Another example is unittest MockMock. Again, it assigns dunder methods after the class is defined.

We can avoid stop the world pauses in both cases by checking if the type is uniquely ref by the current thread, the type object is newly created here and not exposed to other threads so we can assign without stopping the world.

@naschemenascheme marked this pull request as ready for reviewMarch 14, 2025 21:37
Copy link
Contributor

@colesburycolesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm concerned about atomicity here (see comment below).

Comment on lines 11314 to 11315
PyInterpreterState *interp = _PyInterpreterState_GET();
_PyEval_StopTheWorld(interp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

_PyEval_StopTheWorld() is always defined and is a no-op in the default build.

@colesbury
Copy link
Contributor

We might be able to makeBEGIN_TYPE_LOCK() a regular mutex acquisition instead of a critical section, but we'll have to be careful that nothing betweenBEGIN_TYPE_LOCK() andEND_TYPE_LOCK() leads to reentrancy.

For TYPE_LOCK, replace critical sections with a simple mutex.  A numberof changes were needed to avoid deadlocking on reentrancy (trying tore-acquire the mutex). Split _PyType_LookupRefAndVersion() into a lockedand unlocked version.  Remove atomic operations where they are notrequired. Remove some cases of TYPE_LOCK being held that is notrequired.
@naschemenascheme marked this pull request as draftMarch 26, 2025 07:49
@nascheme
Copy link
MemberAuthor

I've convertedTYPE_LOCK from being used as a critical section to being a regular mutex. There was a number of cases of reentrancy and so things got complex. I think I nearly have that sorted out now. Themro_invoke() function needs work yet. At least, unit tests pass without deadlocking. One somewhat ugly part yet is that_PyType_LookupRefAndVersion() got split into two versions with only slight differences between the functions (taking lock if needed vs caller already holding lock or the lock not being needed). Perhaps I can find a way to refactor it to avoid code duplication.

@naschemenascheme marked this pull request as ready for reviewMarch 27, 2025 15:49
Copy link
Contributor

@colesburycolesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice!

I think there's still too much mixing of the type lock and the stop the world. I think data should either be protected by the type lock OR by stop-the-world, but not both.

From reading through this:

type lock:

  • tp_version_tag
  • _spec_cache

stop the world:

  • tp_flags (modifications)
  • bases (modifications)

Additionally, there's a few functions that are named with_lock_held but are called in a stop-the-world pause (not with the type lock held), which I found a bit confusing.

Comment on lines 3552 to 3554
types_mutex_unlock();
new_mro = mro_invoke(type); /* might cause reentrance */
types_mutex_lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This doesn't seem right to me. This can be called with both the type lock held and the world stopped. Ifmro_invoke can call arbitrary code, it needs to start the world around the call.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, that can call a custommro() method on the meta class. The only thing in Lib that does that is atest_descr test but obviously we need to handle it.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I believe this has been resolved. Insidemro_invoke() I either start-the-world before callingmro() (if it's custom) or I unlock the mutex. The latter case happens duringtype_ready().

@nascheme
Copy link
MemberAuthor

I'm not convinced that the switch of the type lock from a critical section to a plain mutex is profitable here.

Okay, I've reverted back to using a critical section forTYPE_LOCK.

This avoids the watcher callback potentially using invalid cacheentries.
* Since type_modified_unlocked() can be called with the world stopped  then we need to re-start if making re-entrant calls (like calling  type watcher callbacks).* Re-order code before type_modified_unlocked().  This fixes a potential  bug if called functions end up re-assigning the type version tag.  If  they do, the cache can contain invalid entries.  We should do the  dictionary update first then set the version to zero.* Replace 'is_new_type' with 'world_stopped' boolean parameter.  This  seems a bit more clear.
Rather than trying to update tp_base, tp_bases, and tp_mro while theworld is stopped, hold TYPE_LOCK when updating them.  This is much moresimilar to how the base version of the code works and requires manyfewer changes.  Using stop-the-world turned out to be difficult to makework correctly, mostly due to custom mro() methods but also for someother reasons (for example, functions that potentially re-enter theinterpreter).This re-work requires that we can call stop-the-world inside a criticalsection and have the mutex still held after re-starting.  This seems towork and I've added asserts to confirm it.  Stop-the-world is now onlyused when updating either tp_flags or other type slots and not the threetype members listed above.
@nascheme
Copy link
MemberAuthor

nascheme commentedApr 23, 2025
edited
Loading

Major rework, reverting back to usingTYPE_LOCK and critical sections to protect tp_mro, tp_bases, tp_base. In a previous version of this PR, I was making updates to those only while the world was stopped. Because we are so limited on what APIs you can call while the world is stopped it is hard to make it work without major overhaul to how types work. Themro() method on the metatype is particularly troublesome but there are other similar problems.

Here is a summary of what's changed vs the main branch:

  • Update tp_flags and type slots (other than tp_base, tp_bases, and tp_mro) only while the world is stopped. This allows us to read those type fields without atomics and without locking.
  • Add_Py_TYPE_REVEALED_FLAG as a debug build check to help prevent code that unsafely writes to a type (missing lock or stop-the-world).
  • Fix a data race with the pname/ptrs "little cache" forresolve_slotdups(). That cache is not safe in the free-threaded build so disable it.
  • A few code cleanup things that are not strictly required but leftover from the previous version of this PR. I think they improve the code so I left them in.

Updated pyperformance run.

These are not required as part of this PR and can be reviewedseparately.
Copy link
Contributor

@colesburycolesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This LGTM once the issue withtest_opcache is resolved

Now that these slot updates use stop-the-world, these two tests arequite a lot slower.  Reduce size of the items list from 1000 to 100.
@naschemenaschemeenabled auto-merge (squash)April 28, 2025 20:26
@naschemenascheme merged commite414a2d intopython:mainApr 28, 2025
48 checks passed
nascheme added a commit to nascheme/cpython that referenced this pull requestApr 29, 2025
nascheme added a commit that referenced this pull requestApr 29, 2025
… (gh-133129)This is triggering deadlocks in test_opcache.  SeeGH-133130 for stack trace.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@sergey-miryanovsergey-miryanovsergey-miryanov left review comments

@kumaraditya303kumaraditya303kumaraditya303 left review comments

@colesburycolesburycolesbury approved these changes

@markshannonmarkshannonAwaiting requested review from markshannonmarkshannon is a code owner

@mpagempageAwaiting requested review from mpage

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

4 participants
@nascheme@kumaraditya303@colesbury@sergey-miryanov

[8]ページ先頭

©2009-2025 Movatter.jp