Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-115999: Implement thread-local bytecode and enable specialization forBINARY_OP#123926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
mpage merged 84 commits intopython:mainfrommpage:gh-115999-thread-local-bytecode
Nov 4, 2024

Conversation

mpage
Copy link
Contributor

@mpagempage commentedSep 10, 2024
edited
Loading

This PR implements the foundational work necessary for making the specializing interpreter thread-safe in free-threaded builds and enables specialization forBINARY_OP as an end-to-end example. To enable future incremental work, specialization can now be toggled on a per-family basis. Subsequent PRs will enable specialization in free-threaded builds for the remaining families.

Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in theco_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in allco_tlbc arrays at thread creation and release the index at thread destruction. The first entry in everyco_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.

Thread-local bytecode can be disabled at runtime by providing either-X tlbc=0 orPYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.

Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.

Eclips4 reacted with heart emojicorona10, Fidget-Spinner, network-shark, and Eclips4 reacted with rocket emoji
- Fix a few places where we were not using atomics to (de)instrument  opcodes.- Fix a few places where we weren't using atomics to reset adaptive  counters.- Remove some redundant non-atomic resets of adaptive counters that  presumably snuck as merge artifacts ofpython#118064  andpython#117144 landing close  together.
Read the opcode atomically, the interpreter may be specializing it
@mpage
Copy link
ContributorAuthor

@markshannon - Would you take a look at this, please?

@markshannon
Copy link
Member

I'm still concerned about not counting the tlbc memory blocks in the refleaks test.

Maybe you could count them separately, and still check that there aren't too many leaked, but be a bit more relaxed about the counts for tlbc than for other blocks?

mpage reacted with thumbs up emoji

@mpage
Copy link
ContributorAuthor

!buildbot nogil refleak

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@mpage for commit07f9140 🤖

The command will test the builders whose names match following regular expression:nogil refleak

The builders matched are:

  • AMD64 CentOS9 NoGIL Refleaks PR
  • AMD64 Fedora Rawhide NoGIL refleaks PR
  • aarch64 Fedora Rawhide NoGIL refleaks PR
  • PPC64LE Fedora Rawhide NoGIL refleaks PR

@mpage
Copy link
ContributorAuthor

I'm still concerned about not counting the tlbc memory blocks in the refleaks test.

Maybe you could count them separately, and still check that there aren't too many leaked, but be a bit more relaxed about the counts for tlbc than for other blocks?

@markshannon - That would work, but I opted for clearing the cached TLBC for threads that aren't currently in use when we clear other internal caches. This should still catch leaks, doesn't require modifyingrefleaks.py, and is the same approach we use for tier2. Please have a look.

# code objects is a large fraction of the total number of
# references, this can cause the total number of allocated
# blocks to exceed the total number of references.
if not support.Py_GIL_DISABLED:
Copy link
Member

@markshannonmarkshannonOct 29, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Now that we can free the unused tlbcs, can we replace this withsys._clear_internal_caches()?

Copy link
ContributorAuthor

@mpagempageOct 29, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Unfortunately, no. It seems to be very sensitive to which kinds of objects are on the heap as well as the number of non reference counted allocations (blocks) per object. With the introduction of TLBC there is at least one additional block allocated per code object that is not reference counted, the _PyCodeArray, which is present even if we free the unused TLBCs. Its presence is enough to trigger the assertion.

This assertion feels pretty brittle and I'd be in favor of removing it, but that's probably worth doing in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe replace it with a more meaningful test rather than remove it. But in another PR.

mpage reacted with thumbs up emoji
Copy link
Member

@markshannonmarkshannon left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks good.

One question. Can we prefix the test for leaking blocks withsys._clear_internal_caches() instead of making it conditional on not using free-threading?

@mpage
Copy link
ContributorAuthor

mpage commentedOct 29, 2024
edited
Loading

One question. Can we prefix the test for leaking blocks withsys._clear_internal_caches() instead of making it conditional on not using free-threading?

@markshannon - Unfortunately that doesn't help. See my reply inline.

@markshannonmarkshannon self-requested a reviewOctober 29, 2024 16:54
Copy link
Member

@markshannonmarkshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I still have concerns about memory use, but we can iterate on that in subsequent PRs.

mpage and corona10 reacted with hooray emoji
@mpage
Copy link
ContributorAuthor

JIT failures appear on main and are unrelated to this PR

@mpagempage merged commit2e95c5b intopython:mainNov 4, 2024
51 of 57 checks passed
@mpagempage deleted the gh-115999-thread-local-bytecode branchNovember 4, 2024 19:14
@encukou
Copy link
Member

After this PR was merged,test_gdb started failing; see e.g.https://buildbot.python.org/#/builders/506/builds/9149
Do you think it's possible to fix this in a day?

@Yhg1s
Copy link
Member

Looks like the problem is only in --enable-shared builds, and it's because we're now looking up _PyInterpreterFrame too early (before the .so file is loaded). I'll have a fix in a few minutes.

@Yhg1s
Copy link
Member

PR#126440 should fix the failure.

encukou reacted with heart emoji

picnixz pushed a commit to picnixz/cpython that referenced this pull requestDec 8, 2024
…ation for `BINARY_OP` (python#123926)Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
ebonnal pushed a commit to ebonnal/cpython that referenced this pull requestJan 12, 2025
…ation for `BINARY_OP` (python#123926)Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads.Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization.Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@JelleZijlstraJelleZijlstraJelleZijlstra left review comments

@Yhg1sYhg1sYhg1s approved these changes

@markshannonmarkshannonmarkshannon approved these changes

@ericsnowcurrentlyericsnowcurrentlyAwaiting requested review from ericsnowcurrentlyericsnowcurrently is a code owner

@brandtbucherbrandtbucherAwaiting requested review from brandtbucher

@corona10corona10Awaiting requested review from corona10

@Fidget-SpinnerFidget-SpinnerAwaiting requested review from Fidget-Spinner

Assignees

@mpagempage

Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

8 participants
@mpage@Fidget-Spinner@bedevere-bot@JelleZijlstra@corona10@markshannon@encukou@Yhg1s

[8]ページ先頭

©2009-2025 Movatter.jp