Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-115999: Add free-threaded specialization for FOR_ITER#128798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
Yhg1s merged 15 commits intopython:mainfromYhg1s:for-iter-spec
Mar 12, 2025

Conversation

Yhg1s
Copy link
Member

@Yhg1sYhg1s commentedJan 13, 2025
edited by bedevere-appbot
Loading

Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)

@Yhg1s
Copy link
MemberAuthor

I still need to add tests for the specialization (rather than the thread-safety, which are in#128637), but otherwise the PR is good enough to review I think.

@mpage
Copy link
Contributor

Why not use the approach that you suggested in discord where we only specialize for the case where the iterator is uniquely referenced by the current thread? It seems like that should cover the overwhelmingly common case, would be simpler to implement, and the resulting instructions would be faster.

@Yhg1s
Copy link
MemberAuthor

I realised that approach wouldn't work for the list referenced by the list iterator (which is the bulk of the work for list iteration) because we're not doing any refcount operations on it (on purpose) and_PyObject_IsUniquelyReferenced() will only do the right thing if all threads hold strong references... and then I think I forgot we should still be able to use that approach for the iterators themselves. I'll see if I can poke holes into that idea tomorrow.

mpage reacted with thumbs up emoji

…ly touniquely referenced iterators. This handles the common case of 'for item inseq' (where 'seq' is a list, tuple or range object) and 'for item ingenerator_function()', but not, for example, 'g = gen(...); for item in g:'.
@Yhg1s
Copy link
MemberAuthor

I've added a test that exercises the deopt paths, although for it to pass with ThreadSanitizer relies on#128637 going in first. The test is a little wacky because for list/range/tuple iterators it's not easy to leak the uniquely referenced iterators. (For generators it could just use a weakref.) I wrote the test to make sure the deopt paths were actually safe, but I'm not sure if long-term those tests make sense. We could just keep them around while we fiddle with specialization, but drop them once we're sure the iterators can't leak (or we change the entire approach to iterator specialization.)

@Yhg1sYhg1s marked this pull request as ready for reviewJanuary 21, 2025 15:21
@mpagempage requested a review fromcolesburyJanuary 23, 2025 18:35
Comment on lines 3114 to 3115
// For free-threaded Python, the loop exit can happen at any point during item
// retrieval, so separate ops don't make much sense.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It doesn't seem like this comment applies to the tuple specialization. Can you delete it if not?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done.

Comment on lines 3031 to 3032
// For free-threaded Python, the loop exit can happen at any point during item
// retrieval, so separate ops don't make much sense.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can you expand a little bit on why this doesn't make sense? It'd be nice to keep the structure of the ops the same between the two builds.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done.

@@ -2055,7 +2055,7 @@ const struct opcode_metadata _PyOpcode_opcode_metadata[266] = {
[FORMAT_WITH_SPEC] = { true, INSTR_FMT_IX, HAS_ERROR_FLAG | HAS_ESCAPES_FLAG },
[FOR_ITER] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_ERROR_FLAG | HAS_ERROR_NO_POP_FLAG | HAS_ESCAPES_FLAG },
[FOR_ITER_GEN] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_DEOPT_FLAG },
[FOR_ITER_LIST] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_EXIT_FLAG },
[FOR_ITER_LIST] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG |HAS_DEOPT_FLAG |HAS_EXIT_FLAG | HAS_ESCAPES_FLAG },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why does this need theHAS_ESCAPES_FLAG? How does it escape?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The mechanism for fetching an item from a shared list includes calling_Py_TryIncrefCompare, which has a path which DECREFs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Which path? I'm confused as why an incref would need to decref.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

When the reference is removed from the array that backs the list between the time it's retrieved and it's returned.

/* Tries to incref the object op and ensures that *src still points to it. */
staticinlineint
_Py_TryIncrefCompare(PyObject**src,PyObject*op)
{
if (_Py_TryIncrefFast(op)) {
return1;
}
if (!_Py_TryIncRefShared(op)) {
return0;
}
if (op!=_Py_atomic_load_ptr(src)) {
Py_DECREF(op);
return0;
}
return1;
}

Added in#114512.

Copy link
Member

@markshannonmarkshannonJan 28, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't see how that works. What happens if*src is modified afterop != _Py_atomic_load_ptr(src) but before the function returns?

Regardless, we want to keepFOR_ITER_LIST non-escaping in the default build.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@colesbury do you remember why lists use_Py_TryXGetRef (in list_get_item_ref) or why it matters that the src ptr still refers to the same item? Can the object be invalid in some way that still requires us to DECREF it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There are two hazards that_Py_TryXGetRef handles:

  1. The objectop may be deallocated between the initial load and the incref. The try-incref, in cooperation with the allocator, handles this case. It returns zero ifop has zero refcount.
  2. The memory block forop may be deallocatedand reallocated for a different object in between the initial load and incref. In that case the try-incref succeeds, but the subsequent check fails and we need to decrefop.

It does not matter if*src is modified afterop != _Py_atomic_load_ptr(src).op will still be a valid reference to an object that was the next element in the list at some point during the operation. It's always been possible for the list to be concurrently modified between the execution ofFOR_ITER and subsequent code that uses the result.

Seehttps://peps.python.org/pep-0703/#optimistically-avoiding-locking

Regardless, we want to keep FOR_ITER_LIST non-escaping in the default build

Is there any performance reason to do so?

Copy link
Member

@brandtbucherbrandtbucherJan 29, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Regardless, we want to keep FOR_ITER_LIST non-escaping in the default build

Is there any performance reason to do so?

Just to follow up: this does make JIT codeslightly worse for traces coveringfor loops over lists. Since the_ITER_NEXT_LIST at the top of the loop is now escaping, this will add a new_CHECK_VALIDITY uop to the start of each loop body where none exists currently.

Consider this loop:

deff():x=0foriinlist(range(10_000)):x+=i

This change increases the number of validity checks per loop from 2 to 3. Before:

_MAKE_WARM_SET_IP_CHECK_PERIODIC_CHECK_VALIDITY <------------- OLD_ITER_CHECK_LIST_GUARD_NOT_EXHAUSTED_LIST_ITER_NEXT_LIST_SET_IP_STORE_FAST_1_CHECK_VALIDITY <------------- OLD_LOAD_FAST_0_LOAD_FAST_1_GUARD_BOTH_INT_BINARY_OP_ADD_INT_SET_IP_STORE_FAST_0_JUMP_TO_TOP

After:

_MAKE_WARM_SET_IP_CHECK_PERIODIC_CHECK_VALIDITY <------------- OLD_ITER_CHECK_LIST_GUARD_NOT_EXHAUSTED_LIST_ITER_NEXT_LIST_CHECK_VALIDITY_AND_SET_IP <-- NEW_STORE_FAST_1_CHECK_VALIDITY <------------- OLD_LOAD_FAST_0_LOAD_FAST_1_GUARD_BOTH_INT_BINARY_OP_ADD_INT_SET_IP_STORE_FAST_0_JUMP_TO_TOP

colesbury reacted with thumbs up emoji
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Just to clarify:
FOR_ITER_LIST is escaping because_PyList_GetItemRefNoLock is escaping, and_PyList_GetItemRefNoLock is escaping because_Py_TryIncrefCompareStackRef is escaping.

Either we need to find an alternative approach or make_Py_TryIncrefCompareStackRef non-escaping.

@Yhg1s
Copy link
MemberAuthor

@markshannon This is the FOR_ITER specialization you wanted to take another look at.

Copy link
Member

@markshannonmarkshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It's a shame aboutFOR_ITER_LIST becoming escaping.

We need to make_Py_TryIncrefCompareStackRef non-escaping, but not in this PR.
Otherwise we'll end up with everything escaping, as more code uses_Py_TryIncrefCompareStackRef.

@@ -2055,7 +2055,7 @@ const struct opcode_metadata _PyOpcode_opcode_metadata[266] = {
[FORMAT_WITH_SPEC] = { true, INSTR_FMT_IX, HAS_ERROR_FLAG | HAS_ESCAPES_FLAG },
[FOR_ITER] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_ERROR_FLAG | HAS_ERROR_NO_POP_FLAG | HAS_ESCAPES_FLAG },
[FOR_ITER_GEN] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_DEOPT_FLAG },
[FOR_ITER_LIST] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG | HAS_EXIT_FLAG },
[FOR_ITER_LIST] = { true, INSTR_FMT_IBC, HAS_ARG_FLAG | HAS_JUMP_FLAG |HAS_DEOPT_FLAG |HAS_EXIT_FLAG | HAS_ESCAPES_FLAG },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Just to clarify:
FOR_ITER_LIST is escaping because_PyList_GetItemRefNoLock is escaping, and_PyList_GetItemRefNoLock is escaping because_Py_TryIncrefCompareStackRef is escaping.

Either we need to find an alternative approach or make_Py_TryIncrefCompareStackRef non-escaping.

@Yhg1s
Copy link
MemberAuthor

We need to make_Py_TryIncrefCompareStackRef non-escaping, but not in this PR. Otherwise we'll end up with everything escaping, as more code uses_Py_TryIncrefCompareStackRef.

I think we discussed this in the weekly meeting a month or two ago... I think the only way to avoid it is to delay DECREF'ing objects we've accidentally incorrectly INCREF'ed -- that is to say, objects that were newly allocated from the same memory as the object we were trying to INCREF but we lost the INCREF/DECREF race with another thread on. I think in this case that might not be too bad: if the INCREF fails we deopt anyway, and we could delay that DECREF until that deopt. For other cases where we need to TryIncref, I'm not sure of the impact of delaying (plus, we'd need some mechanism for the delay).

@Yhg1s
Copy link
MemberAuthor

I think the only way to avoid it is to delay DECREF'ing objects we've accidentally incorrectly INCREF'ed -- that is to say, objects that were newly allocated from the same memory as the object we were trying to INCREF but we lost the INCREF/DECREF race with another thread on. I think in this case that might not be too bad: if the INCREF fails we deopt anyway, and we could delay that DECREF until that deopt.

@markshannon let me know if you want that to happen in this PR (but I'd prefer it not).

@markshannon
Copy link
Member

No need to do it in this PR.

@Yhg1sYhg1s merged commitde2f7da intopython:mainMar 12, 2025
76 checks passed
plashchynski pushed a commit to plashchynski/cpython that referenced this pull requestMar 17, 2025
…n#128798)Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)
seehwan pushed a commit to seehwan/cpython that referenced this pull requestApr 16, 2025
…n#128798)Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@mpagempagempage left review comments

@colesburycolesburycolesbury left review comments

@brandtbucherbrandtbucherbrandtbucher left review comments

@markshannonmarkshannonmarkshannon approved these changes

Assignees
No one assigned
Labels
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

5 participants
@Yhg1s@mpage@markshannon@colesbury@brandtbucher

[8]ページ先頭

©2009-2025 Movatter.jp