Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-103323: Get the "Current" Thread State from a Thread-Local Variable#103324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation

@ericsnowcurrently
Copy link
Member

@ericsnowcurrentlyericsnowcurrently commentedApr 6, 2023
edited
Loading

We replace_PyRuntime.tstate_current with a thread-local variable. As part of this change, we add a_Py_thread_local macro in pyport.h (only for the core runtime) to smooth out the compiler differences. The main motivation here is in support of a per-interpreter GIL, but this change also provides some performance improvement opportunities.

Note that we do not provide a fallback to the thread-local, either falling back to the oldtstate_current or to thread-specific storage (PyThread_tss_*()). If that proves problematic then we can circle back. I consider it unlikely, but will run the buildbots to double-check.

Also note that this does not change any of the code related to the GILState API, where it uses a thread state stored in thread-specific storage. I suspect we can combine that with_Py_tss_tstate (from here). However, that can be addressed separately and is not urgent (nor critical).

My only remaining uncertainty is with the existing "GIL is held" constraint. With_PyRuntime.tstate_current, it was only guaranteed valid in the thread currently holding the GIL, if any. With this change, it is valid even when the GIL isn't held. I don't see how that would be a problem, but I'm going to double-check anyway.

(While this change was mostly done independently, I did take some inspiration from earlier (~2020) work by@markshannon (main...markshannon:threadstate_in_tls) and@vstinner (#23976).)

@ericsnowcurrently
Copy link
MemberAuthor

ericsnowcurrently commentedApr 7, 2023
edited
Loading

Per the benchmarks, this change is a little faster (less than 1%) on Linux/GCC.

@ericsnowcurrentlyericsnowcurrently marked this pull request as ready for reviewApril 7, 2023 18:14
@ericsnowcurrentlyericsnowcurrently added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelApr 7, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@ericsnowcurrently for commitfeb8ef5 🤖

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelApr 7, 2023
@markshannon
Copy link
Member

It might be worth trying to see what is the performance impact of storing the interpreter state in TLS as well.
No need to do that in this PR, though.

@ericsnowcurrentlyericsnowcurrently merged commitf8abfa3 intopython:mainApr 24, 2023
@ericsnowcurrentlyericsnowcurrently deleted the tstate_current-as-thread_local branchApril 24, 2023 17:17
carljm added a commit to carljm/cpython that referenced this pull requestApr 24, 2023
* main: (53 commits)pythongh-102498 Clean up unused variables and imports in the email module  (python#102482)pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244)pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365)pythongh-94300: Update datetime.strptime documentation (python#95318)pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778)pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456)pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775)  Revert "Add tests for empty range equality (python#103751)" (python#103770)pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519)pythonGH-65022: Fix description of copyreg.pickle function (python#102656)pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324)pythongh-91687: modernize dataclass example typing (python#103773)pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747)pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769)pythongh-87452: Improve the Popen.returncode docs  Removed unnecessary escaping of asterisks (python#103714)pythonGH-102973: Slim down Fedora packages in the dev container (python#103283)pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095)  Add tests for empty range equality (python#103751)pythongh-103712: Increase the length of the type name in AttributeError messages (python#103713)  ...
carljm added a commit to carljm/cpython that referenced this pull requestApr 24, 2023
* superopt: (82 commits)pythongh-101517: fix line number propagation in code generated for except* (python#103550)pythongh-103780: Use patch instead of mock in asyncio unix events test (python#103782)pythongh-102498 Clean up unused variables and imports in the email module  (python#102482)pythongh-99184: Bypass instance attribute access in `repr` of `weakref.ref` (python#99244)pythongh-99032: datetime docs: Encoding is no longer relevant (python#93365)pythongh-94300: Update datetime.strptime documentation (python#95318)pythongh-103776: Remove explicit uses of $(SHELL) from Makefile (pythonGH-103778)pythongh-87092: fix a few cases of incorrect error handling in compiler (python#103456)pythonGH-103727: Avoid advancing tokenizer too far in f-string mode (pythonGH-103775)  Revert "Add tests for empty range equality (python#103751)" (python#103770)pythongh-94518: Port 23-argument `_posixsubprocess.fork_exec` to Argument Clinic (python#94519)pythonGH-65022: Fix description of copyreg.pickle function (python#102656)pythongh-103323: Get the "Current" Thread State from a Thread-Local Variable (pythongh-103324)pythongh-91687: modernize dataclass example typing (python#103773)pythongh-103746: Test `types.UnionType` and `Literal` types together (python#103747)pythongh-103765: Fix 'Warning: py:class reference target not found: ModuleSpec' (pythonGH-103769)pythongh-87452: Improve the Popen.returncode docs  Removed unnecessary escaping of asterisks (python#103714)pythonGH-102973: Slim down Fedora packages in the dev container (python#103283)pythongh-103091: Add PyUnstable_Type_AssignVersionTag (python#103095)  ...
@ericsnowcurrentlyericsnowcurrently restored the tstate_current-as-thread_local branchApril 25, 2023 15:59
@ericsnowcurrently
Copy link
MemberAuthor

FTR, on Windows this introduced a ~2% performance regression, and on MacOS there's ~3% regression.

Note that these penalties may be partially mitigated by passing the current thread state as an argument throughout the internal C-API (where currently we only do so in some places). The implementation here is also relatively naïve. There are likely opportunities to improve performance via compiler-specific directives.

@ericsnowcurrentlyericsnowcurrently deleted the tstate_current-as-thread_local branchApril 25, 2023 20:52
staticinlinePyThreadState*
_PyRuntimeState_GetThreadState(_PyRuntimeState*Py_UNUSED(runtime))
{
return_PyThreadState_GET();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This function no longer makes sense: I wrote PR#104171 to remove it.

ericsnowcurrently reacted with thumbs up emoji
@vstinner
Copy link
Member

(While this change was mostly done independently, I did take some inspiration from earlier (~2020) work by@markshannon (main...markshannon:threadstate_in_tls) and@vstinner (#23976).)

I prepared this changefor years in advance:

  • Python 3.8:

    • I modified the PyThreadState_GET() macro to make it an alias to PyThreadState_Get() function: so it's always afunction call, and no longer a macro to hide implementation details.
    • I added_PyThreadState_GET() static inline function to the internal C API to get more freedom on its implementation.
    • I modified many internal C functions to pass explicitly the tstate variable:Pass the Python thread state explicitly. For example, I added_PyErr_Occurred(tstate) function which replacesPyErr_Occurred() (which has no argument). The idea is that in the future, calling _PyThreadState_GET() may before slower if the value is retreived from a thread local storage (TLS) variable: which is done in this PR.
    • I modified many functions to pass the "state" more explicitly: runtime, tstate and/or interp. Example:_PySys_Create(runtime, interp, &sysmod) call inPython/pylifecycle.c.
  • Python 3.9:

    • I converted _PyThreadState_GET() macro to astatic inline function.
    • I added_PyInterpreterState_GET() static inline function. Maybe the implementation will change in the future to be more efficient (ex: thread local storage?).
ericsnowcurrently reacted with heart emoji

@vstinner
Copy link
Member

My only remaining uncertainty is with the existing "GIL is held" constraint. With _PyRuntime.tstate_current, it was only guaranteed valid in the thread currently holding the GIL, if any. With this change, it is valid even when the GIL isn't held. I don't see how that would be a problem, but I'm going to double-check anyway.

In the Python 3.9 and 3.10 era, I moved multiple global states to the "interpreter state" (interp):https://pythondev.readthedocs.io/subinterpreters.html#done These changes caused various crashes in third party C extensions which use the C API with the GIL released (!). For example, callingPyLong_FromLong(1) with the GIL released. This was always illegal and invalid according to the C API documentation. But you know, there are always bugs in the wild. All affected C extensions have been fixed in the meanwhile. Also, some states were made global again (small integer singletons), and immortal objects also made the situation differnet.

@vstinner
Copy link
Member

FTR, on Windows this introduced a ~2% performance regression, and on MacOS there's ~3% regression.

It might be interesting to check the hot code calling _PyThreadState_GET() and see if tstate could be passed to only call _PyThreadState_GET() once. I'm not sure if it's worth it. Also, in stdlib C extensions, I would prefer to use the internal C APIless rather thanmore :-)

#defineWITH_THREAD
#endif

#ifdef WITH_THREAD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This test is useless. This macro is now always defined. It's only kept for backward compatibility.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Doesn't it affect WASM builds?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

See the code 3 lines above:

#ifndefWITH_THREAD#  defineWITH_THREAD#endif

ericsnowcurrently reacted with thumbs up emoji
#if defined(HAVE_THREAD_LOCAL)&& !defined(Py_BUILD_CORE_MODULE)
extern_Py_thread_localPyThreadState*_Py_tss_tstate;
#endif
PyAPI_DATA(PyThreadState*)_PyThreadState_GetCurrent(void);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What is the use case for this new _PyThreadState_GetCurrent() function? There is already PyThreadState_Get(). How is it different?

The API to get the current thread state is already complicated and has a complicated history:https://pythondev.readthedocs.io/pystate.html

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It's there because I couldn't find a way to mixPyAPI_DATA with_Py_thread_local. Looks like that's the same issue you ran into in 2020.

@vstinner
Copy link
Member

Does this change fix indirectly the PyGILState API for subinterpreters? See:#59956

@vstinner
Copy link
Member

Thanks@ericsnowcurrently for taking care of this very old project!

It seems like the!defined(Py_BUILD_CORE_MODULE) test inpycore_pystate.h avoids the complicated linker issuses that I had on Windows and macOS when I tried a similar change in 2020 (PR#23976).

ericsnowcurrently reacted with thumbs up emoji

ZeroIntensity added a commit that referenced this pull requestOct 28, 2025
Python has required thread local support since 3.12 (seeGH-103324). By assuming that thread locals are always supported, we can improve the performance of third-party extensions by allowing them to access the attached thread and interpreter states directly.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@vstinnervstinnervstinner left review comments

@markshannonmarkshannonAwaiting requested review from markshannon

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@ericsnowcurrently@bedevere-bot@markshannon@vstinner

[8]ページ先頭

©2009-2025 Movatter.jp