Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-117578: Fix inlining regression in PyType_GetModuleByDef()#123100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
neonene wants to merge4 commits intopython:mainfromneonene:bydef-inline
Closed

gh-117578: Fix inlining regression in PyType_GetModuleByDef()#123100

neonene wants to merge4 commits intopython:mainfromneonene:bydef-inline

Conversation

@neonene
Copy link
Contributor

@neoneneneonene commentedAug 17, 2024
edited
Loading

Onmain and3.13, there are cases where theget_module_by_def function intypeobject.c is not inlined in its wrapper functions:

WrappersWindowscallee:get_module_by_def()
PyType_GetModuleByDef()Release/Ob2:called/Ob3:inlined
PGOinlined
_PyType_GetModuleByDef2()Release/Ob2:called/Ob3:inlined
PGOcalled

Non-builtin modules can have extra function-call overheads, where the wrappers cannot be inlined.

This PR specifiesPy_ALWAYS_INLINE to the callee.

cc@encukou

@neoneneneonene changed the titlegh-117578: Fix inlining regression in PyType_GetModuleByDef() familygh-117578: Fix inlining regression in PyType_GetModuleByDef()Aug 17, 2024
@encukou
Copy link
Member

Hm, it doesn't sound right to override profile-guided optimization, especially sincetest_decimal (the only current caller of_PyType_GetModuleByDef2) is in the PGO test set.
Does this PR have a significant performance impact?

@neonene
Copy link
ContributorAuthor

neonene commentedAug 20, 2024
edited
Loading

It is mentioned onthe faster-cpython repo that thetelco test has slowed down a lot.

According to MSVC, the module state access counts were:

Function / breakdown     entry-cnt  alternative access-----------------------  ---------  ---------------------------PyType_GetModuleByDef()    6852643    convert_op             2971848  via context object    PyDecType_New          1651188  via context object (partial)    dec_addstatus          1651188  via context object (partial)    current_context        1486221  via context object (partial)    dec_mpd_qquantize       247731  METH_METHOD    ctx_mpd_qquantize       165000  METH_METHOD    ..._PyType_GetModuleByDef2()  1073193    nm_mpd_qadd             660462    nm_mpd_qmul             412731

Tested with the/Ob3 option, switching the inlining specifier:f740a5d. My Release/PGO builds on Windows get slower using TLS version ofPyThreadState_Get(), which is also observed at#103324 (comment). If *nix OSes are in good health with TLS, I guess I also need to run thetelco with a good condition (without TLS):

sub
GetModuleByDefcallinlineinline
GetModuleByDef2callcallinline
normal 3.14perf(the higher, the faster)
1.00x(base)1.01x
1.03x1.05x1.08xrespect alternative
less TLS overheadperf(experiment)
1.05x1.05x1.06x
1.08x1.08x1.08xrespect alternative
module state accessnormalTLS-less
current(base)1.05x
This PR1.01x1.06x
PyThreadState_Get()1.05x1.04xexample patches
PR with alternatives1.08x1.08x
global state1.08x1.11x
static type (GC unused)1.14x1.14xtaken from 3.12

This patch would need to be applied if we wanted as much speed as the global state access on Windows, which has little effect alone (1%) for some reason.

@neonene
Copy link
ContributorAuthor

Windows PGO:
- telco: 9.21 ms +- 0.12 ms -> 9.02 ms +- 0.09 ms: 1.02x faster

@neonene
Copy link
ContributorAuthor

neonene commentedAug 21, 2024
edited
Loading

Is it acceptable thattest_decimal.py has a test case like below, instead of touching the C code?

@requires_cdecimalclassCArithmeticOperatorsTest(ArithmeticOperatorsTest,unittest.TestCase):    ...@unittest.skipIf(nottest.support.PGO,'PGO training only')deftest_excecise_binop(self):Decimal=self.decimal.Decimald=Decimal('11.1')foriinrange(500000):1+d# at least 300000 times

@neonene
Copy link
ContributorAuthor

I'll tryPyType_GetBaseByToken() version.

@neonene
Copy link
ContributorAuthor

Closing in favor of proposing thePyType_GetBaseByToken() version, which can supersede_PyType_GetModuleByDef2() on PGO and Relase(/Ob3) builds.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@markshannonmarkshannonAwaiting requested review from markshannonmarkshannon is a code owner

Assignees

No one assigned

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@neonene@encukou

[8]ページ先頭

©2009-2025 Movatter.jp