Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-98003: Inline call frames for CALL_FUNCTION_EX#98004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged

Conversation

Fidget-Spinner
Copy link
Member

@Fidget-SpinnerFidget-Spinner commentedOct 6, 2022
edited by bedevere-bot
Loading

@brandtbucher
Copy link
Member

Looks like not much difference:

Benchmarks with tag 'apps':===========================Slower (2):- html5lib: 59.3 ms +- 2.4 ms -> 60.4 ms +- 2.4 ms: 1.02x slower- tornado_http: 92.9 ms +- 1.3 ms -> 93.4 ms +- 1.4 ms: 1.01x slowerFaster (1):- 2to3: 247 ms +- 1 ms -> 246 ms +- 0 ms: 1.00x fasterBenchmark hidden because not significant (1): chameleonGeometric mean: 1.01x slowerBenchmarks with tag 'math':===========================Slower (1):- pidigits: 193 ms +- 0 ms -> 199 ms +- 0 ms: 1.03x slowerFaster (2):- nbody: 96.2 ms +- 2.0 ms -> 94.9 ms +- 1.9 ms: 1.01x faster- float: 71.7 ms +- 0.9 ms -> 71.3 ms +- 1.0 ms: 1.01x fasterGeometric mean: 1.00x slowerBenchmarks with tag 'regex':============================Slower (2):- regex_effbot: 3.33 ms +- 0.02 ms -> 3.39 ms +- 0.02 ms: 1.02x slower- regex_v8: 21.1 ms +- 0.1 ms -> 21.3 ms +- 0.1 ms: 1.01x slowerFaster (2):- regex_dna: 200 ms +- 1 ms -> 198 ms +- 1 ms: 1.01x faster- regex_compile: 128 ms +- 1 ms -> 127 ms +- 1 ms: 1.01x fasterGeometric mean: 1.00x slowerBenchmarks with tag 'serialize':================================Slower (2):- pickle_list: 3.92 us +- 0.03 us -> 4.11 us +- 0.10 us: 1.05x slower- pickle_dict: 30.3 us +- 0.1 us -> 31.1 us +- 0.1 us: 1.03x slowerFaster (3):- xml_etree_iterparse: 105 ms +- 2 ms -> 101 ms +- 1 ms: 1.03x faster- pickle: 10.3 us +- 0.3 us -> 10.1 us +- 0.1 us: 1.02x faster- xml_etree_process: 53.2 ms +- 0.5 ms -> 52.4 ms +- 0.5 ms: 1.01x fasterBenchmark hidden because not significant (8): json_dumps, json_loads, pickle_pure_python, unpickle, unpickle_list, unpickle_pure_python, xml_etree_parse, xml_etree_generateGeometric mean: 1.00x fasterBenchmarks with tag 'startup':==============================Slower (2):- python_startup_no_site: 6.08 ms +- 0.01 ms -> 6.08 ms +- 0.01 ms: 1.00x slower- python_startup: 8.45 ms +- 0.01 ms -> 8.45 ms +- 0.01 ms: 1.00x slowerGeometric mean: 1.00x slowerBenchmarks with tag 'template':===============================Slower (3):- mako: 9.36 ms +- 0.05 ms -> 9.69 ms +- 0.04 ms: 1.04x slower- genshi_xml: 48.5 ms +- 0.4 ms -> 49.5 ms +- 0.6 ms: 1.02x slower- genshi_text: 20.9 ms +- 0.3 ms -> 21.2 ms +- 0.3 ms: 1.02x slowerFaster (1):- django_template: 32.5 ms +- 0.4 ms -> 32.2 ms +- 0.3 ms: 1.01x fasterGeometric mean: 1.02x slowerAll benchmarks:===============Slower (31):- pickle_list: 3.92 us +- 0.03 us -> 4.11 us +- 0.10 us: 1.05x slower- unpack_sequence: 45.7 ns +- 0.9 ns -> 47.9 ns +- 0.7 ns: 1.05x slower- mako: 9.36 ms +- 0.05 ms -> 9.69 ms +- 0.04 ms: 1.04x slower- deepcopy_reduce: 2.87 us +- 0.04 us -> 2.97 us +- 0.04 us: 1.04x slower- go: 132 ms +- 1 ms -> 136 ms +- 1 ms: 1.03x slower- deepcopy_memo: 33.2 us +- 0.3 us -> 34.3 us +- 0.4 us: 1.03x slower- pidigits: 193 ms +- 0 ms -> 199 ms +- 0 ms: 1.03x slower- pickle_dict: 30.3 us +- 0.1 us -> 31.1 us +- 0.1 us: 1.03x slower- json: 4.62 ms +- 0.08 ms -> 4.73 ms +- 0.12 ms: 1.03x slower- genshi_xml: 48.5 ms +- 0.4 ms -> 49.5 ms +- 0.6 ms: 1.02x slower- html5lib: 59.3 ms +- 2.4 ms -> 60.4 ms +- 2.4 ms: 1.02x slower- regex_effbot: 3.33 ms +- 0.02 ms -> 3.39 ms +- 0.02 ms: 1.02x slower- logging_silent: 90.3 ns +- 0.9 ns -> 91.9 ns +- 1.8 ns: 1.02x slower- richards: 43.0 ms +- 1.3 ms -> 43.6 ms +- 1.1 ms: 1.02x slower- genshi_text: 20.9 ms +- 0.3 ms -> 21.2 ms +- 0.3 ms: 1.02x slower- generators: 73.3 ms +- 0.5 ms -> 74.3 ms +- 0.2 ms: 1.01x slower- crypto_pyaes: 73.0 ms +- 0.6 ms -> 73.9 ms +- 0.5 ms: 1.01x slower- deepcopy: 324 us +- 3 us -> 327 us +- 4 us: 1.01x slower- sqlglot_parse: 1.32 ms +- 0.01 ms -> 1.33 ms +- 0.01 ms: 1.01x slower- logging_format: 6.32 us +- 0.07 us -> 6.38 us +- 0.12 us: 1.01x slower- regex_v8: 21.1 ms +- 0.1 ms -> 21.3 ms +- 0.1 ms: 1.01x slower- thrift: 736 us +- 13 us -> 743 us +- 17 us: 1.01x slower- chaos: 67.4 ms +- 0.8 ms -> 67.9 ms +- 0.9 ms: 1.01x slower- sympy_sum: 162 ms +- 2 ms -> 163 ms +- 1 ms: 1.01x slower- tornado_http: 92.9 ms +- 1.3 ms -> 93.4 ms +- 1.4 ms: 1.01x slower- coroutines: 23.6 ms +- 0.2 ms -> 23.8 ms +- 0.1 ms: 1.01x slower- aiohttp: 1.00 ms +- 0.00 ms -> 1.01 ms +- 0.00 ms: 1.01x slower- dulwich_log: 60.9 ms +- 0.4 ms -> 61.1 ms +- 0.4 ms: 1.00x slower- gunicorn: 1.09 ms +- 0.01 ms -> 1.09 ms +- 0.01 ms: 1.00x slower- python_startup_no_site: 6.08 ms +- 0.01 ms -> 6.08 ms +- 0.01 ms: 1.00x slower- python_startup: 8.45 ms +- 0.01 ms -> 8.45 ms +- 0.01 ms: 1.00x slowerFaster (29):- mdp: 2.70 sec +- 0.01 sec -> 2.53 sec +- 0.01 sec: 1.07x faster- fannkuch: 382 ms +- 4 ms -> 366 ms +- 5 ms: 1.05x faster- nqueens: 80.4 ms +- 1.0 ms -> 77.7 ms +- 1.2 ms: 1.03x faster- xml_etree_iterparse: 105 ms +- 2 ms -> 101 ms +- 1 ms: 1.03x faster- scimark_sparse_mat_mult: 4.10 ms +- 0.08 ms -> 4.00 ms +- 0.05 ms: 1.03x faster- spectral_norm: 94.1 ms +- 2.9 ms -> 91.8 ms +- 1.2 ms: 1.03x faster- telco: 6.50 ms +- 0.11 ms -> 6.36 ms +- 0.12 ms: 1.02x faster- scimark_fft: 311 ms +- 5 ms -> 305 ms +- 3 ms: 1.02x faster- pickle: 10.3 us +- 0.3 us -> 10.1 us +- 0.1 us: 1.02x faster- xml_etree_process: 53.2 ms +- 0.5 ms -> 52.4 ms +- 0.5 ms: 1.01x faster- pathlib: 17.7 ms +- 0.2 ms -> 17.5 ms +- 0.3 ms: 1.01x faster- nbody: 96.2 ms +- 2.0 ms -> 94.9 ms +- 1.9 ms: 1.01x faster- sympy_str: 282 ms +- 6 ms -> 279 ms +- 4 ms: 1.01x faster- regex_dna: 200 ms +- 1 ms -> 198 ms +- 1 ms: 1.01x faster- scimark_sor: 103 ms +- 2 ms -> 102 ms +- 1 ms: 1.01x faster- async_tree_none: 517 ms +- 10 ms -> 512 ms +- 9 ms: 1.01x faster- django_template: 32.5 ms +- 0.4 ms -> 32.2 ms +- 0.3 ms: 1.01x faster- sympy_expand: 453 ms +- 7 ms -> 450 ms +- 6 ms: 1.01x faster- async_tree_io: 1.31 sec +- 0.01 sec -> 1.30 sec +- 0.01 sec: 1.01x faster- meteor_contest: 103 ms +- 1 ms -> 102 ms +- 2 ms: 1.01x faster- regex_compile: 128 ms +- 1 ms -> 127 ms +- 1 ms: 1.01x faster- scimark_monte_carlo: 66.2 ms +- 1.0 ms -> 65.8 ms +- 0.9 ms: 1.01x faster- float: 71.7 ms +- 0.9 ms -> 71.3 ms +- 1.0 ms: 1.01x faster- sqlite_synth: 2.59 us +- 0.02 us -> 2.58 us +- 0.03 us: 1.00x faster- 2to3: 247 ms +- 1 ms -> 246 ms +- 0 ms: 1.00x faster- hexiom: 5.99 ms +- 0.04 ms -> 5.97 ms +- 0.04 ms: 1.00x faster- sqlglot_normalize: 104 ms +- 1 ms -> 104 ms +- 1 ms: 1.00x faster- sympy_integrate: 20.3 ms +- 0.1 ms -> 20.3 ms +- 0.1 ms: 1.00x faster- sqlglot_optimize: 50.9 ms +- 0.3 ms -> 50.8 ms +- 0.2 ms: 1.00x fasterBenchmark hidden because not significant (26): async_tree_cpu_io_mixed, async_tree_memoization, chameleon, coverage, deltablue, flaskblogging, json_dumps, json_loads, logging_simple, mypy, pickle_pure_python, pprint_safe_repr, pprint_pformat, pycparser, pyflate, pylint, raytrace, scimark_lu, sqlalchemy_declarative, sqlalchemy_imperative, sqlglot_transpile, unpickle, unpickle_list, unpickle_pure_python, xml_etree_parse, xml_etree_generateGeometric mean: 1.00x slower

Perhaps we should try inlining allCALL_FUNCTION_EX, rather than specializing? Your call.

Fidget-Spinner reacted with heart emoji

@Fidget-SpinnerFidget-Spinner changed the titlegh-98003: Specialize CALL_FUNCTION_EXgh-98003: Inline call frames for CALL_FUNCTION_EXOct 7, 2022
@Fidget-Spinner
Copy link
MemberAuthor

@markshannon gentle ping for a review please.

@markshannon
Copy link
Member

The approach I would prefer is to:

  1. RefactorPyFunction_Type.tp_call so that it creates a frame, then callsPyEval_EvalFrame()
  2. Use the factored out code for creating the frame inCALL_FUNCTION_EX

There is a lot unnecessary dismantling of tuples and dicts in the current approach. This is not a problem with this PR, but I think it needs to be sorted out before "inlining" the call inCALL_FUNCTION_EX

IdeallyCALL_FUNCTION_EX would look something like this:

CALL_FUNCTION_EX:   if (Py_IS_TYPE(func, &PyFunction_Type) &&        tstate->interp->eval_frame == NULL &&        ((PyFunctionObject *)func)->vectorcall == _PyFunction_Vectorcall) {                _PyInterpreterFrame *new_frame = _PyEvalFramePushAndInit_TD(                    tstate, (PyFunctionObject *)function, locals, callargs, kwargs);                );                Py_DECREF(callargs);                Py_DECREF(kwargs);                if (new_frame == NULL) {                    goto error;                }                /* Enter frame */               ...   }   ...

Where_PyEvalFramePushAndInit_TD pushes the frame, and calls the factored out function for initializing the frame from*args, **kwargs.

@Fidget-Spinner
Copy link
MemberAuthor

The approach I would prefer is to:

  1. RefactorPyFunction_Type.tp_call so that it creates a frame, then callsPyEval_EvalFrame()
  2. Use the factored out code for creating the frame inCALL_FUNCTION_EX

I'm confused. The first step is to create a frame, but then in your example code you show it creating another frame again. Why does it do that?

@markshannon
Copy link
Member

Because we don't really create a frame. We push a frame and then initialize it.

Pushing a frame is simple. It is initializing that is complex and we want to factor out.

@brandtbucher
Copy link
Member

Just curious, has this stalled out for now? At the very least, the changes inceval.c will need to move to the newbytecodes.c, and you'll probably want to use the newDISPATCH_INLINED macro for inlining the call.

Totally get it if you're busy with school or something right now though! :) Just making sure that you aren't blocked on reviews or design discussions. If so, maybe we can move this forward (I'd really like to see it in)!

@Fidget-Spinner
Copy link
MemberAuthor

Totally get it if you're busy with school or something right now though! :) Just making sure that you aren't blocked on reviews or design discussions. If so, maybe we can move this forward (I'd really like to see it in)!

Yeah my exams end next week. I plan to start contributing actively again after then!

brandtbucher reacted with thumbs up emojiAlexWaygood reacted with heart emoji

@Fidget-Spinner
Copy link
MemberAuthor

@brandtbucher and@markshannon I think this is ready for your reviews (when you have time of course).

Copy link
Member

@markshannonmarkshannon left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This approach creates a lot of intermediate objects.
I'm not sure if that is a result of the transformations in this PR, or if the original code also did created those intermediate objects.

If the objects were already being created, then maybe we should streamline it in another PR.
@Fidget-Spinner what do you think?

int code_flags = ((PyCodeObject *)PyFunction_GET_CODE(func))->co_flags;
PyObject *locals = code_flags & CO_OPTIMIZED ? NULL : Py_NewRef(PyFunction_GET_GLOBALS(func));

_PyInterpreterFrame *new_frame = _PyEvalFramePushAndInit_Ex(tstate,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it would make sense to move the unpacking of the tuple into_PyEvalFramePushAndInit_Ex.

_PyInterpreterFrame *new_frame = _PyEvalFramePushAndInit_Ex(tstate, func, locals, callargs, kwargs).

CALL_FUNCTION_EX is quite a large instruction. Pushing more code into_PyEvalFramePushAndInit_Ex would keep it smaller.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Unpacking which tuple?

If you mean move the code above in, that would break convention with the rest of the code base. The other inline calls do these

                int code_flags = ((PyCodeObject *)PyFunction_GET_CODE(func))->co_flags;                PyObject *locals = code_flags & CO_OPTIMIZED ? NULL : Py_NewRef(PyFunction_GET_GLOBALS(func));

before creating the new frame.

Py_INCREF(PyTuple_GET_ITEM(callargs, i));
}
}
_PyInterpreterFrame *new_frame = _PyEvalFramePushAndInit(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Rather than transformingcallargs andkwargs into a shape suitable for_PyEvalFramePushAndInit, it might make sense to push the frame, then unpack the tuple and dict into the the frame without the intermediate objects.
It complicates the errror handling a bit, and the frame will need to be cleaned up if there is an error.

Copy link
MemberAuthor

@Fidget-SpinnerFidget-SpinnerDec 9, 2022
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I can work on this in another PR.

@markshannon
Copy link
Member

Thanks for this PR. Avoiding C stack consumption for these calls is definitely something we want.

@Fidget-Spinner
Copy link
MemberAuthor

This approach creates a lot of intermediate objects.
I'm not sure if that is a result of the transformations in this PR, or if the original code also did created those intermediate objects.

If the objects were already being created, then maybe we should streamline it in another PR.
@Fidget-Spinner what do you think?

Yes these are the semantics of the original CALL_FUNCTION_EX. Most of these were copy pasted. I was surprised at how allocation heavy CALL_FUNCTION_EX is and have been thinking of ways to improve it.

Copy link
Member

@brandtbucherbrandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Not an expert on the call sequence, but the code generally looks good to me! I'd be excited to see this in 3.12.

I'm not sure how this interacts with PEP 669, but I'm sure@markshannon can answer that pretty easily.

Also, I'd kinda like to see buildbots (for correctness) and benchmarks (for curiousity) on this before merging. I can run the benchmarks for you whenever you think it's ready.

PyObject *const *newargs = has_dict
? _PyStack_UnpackDict(tstate, _PyTuple_ITEMS(callargs),
nargs, kwargs, &kwnames)
: &PyTuple_GET_ITEM(callargs, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is this safe iflen(callargs) == 0?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It should be. Whenlen(callargs) == 0 the args shouldn't be used by the vectorcall protocol.

brandtbucher reacted with thumbs up emoji
@Fidget-Spinner
Copy link
MemberAuthor

Not sure why all the#line directives in the generated_cases.c.h file were removed after I ran the script. Does anyone know?

@Fidget-Spinner
Copy link
MemberAuthor

Also, I'd kinda like to see buildbots (for correctness) and benchmarks (for curiousity) on this before merging. I can run the benchmarks for you whenever you think it's ready.

You can run the benchmarks but I expect no change sadly. Extended function calls are used rarely in pyperformance.

@Fidget-SpinnerFidget-Spinner added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelApr 27, 2023
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@Fidget-Spinner for commitdfd6b01 🤖

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelApr 27, 2023
@brandtbucher
Copy link
Member

Not sure why all the#line directives in the generated_cases.c.h file were removed after I ran the script. Does anyone know?

The regen script needs to be run with--emit-line-directives. I think the Makefile was updated to use the new option, but maybe the Windows build scripts weren't updated with the new option (which is off by default). Maybe you know the right place to add it?

You can run the benchmarks but I expect no change sadly. Extended function calls are used rarely in pyperformance.

Understood. I just kicked off a run anyways. :)

Fidget-Spinner reacted with heart emoji

Co-authored-by: Carl Meyer <carl@oddbird.net>
@brandtbucher
Copy link
Member

Yep,no real change on pyperformance (as expected).

@arhadthedev
Copy link
Member

Removing sprint because this issue missed the ones in 2022 and 2023.

@Fidget-Spinner
Copy link
MemberAuthor

Removing sprint because this issue missed the ones in 2022 and 2023.

Yeah this issue was added in the 2022 CPython core dev sprint which us some ways back now.

@Fidget-SpinnerFidget-Spinner merged commited95e8c intopython:mainApr 30, 2023
@Fidget-SpinnerFidget-Spinner deleted the call_function_ex_inline_3 branchApril 30, 2023 13:08
@Fidget-Spinner
Copy link
MemberAuthor

Thanks everyone!

carljm added a commit to carljm/cpython that referenced this pull requestMay 1, 2023
* main: (26 commits)pythongh-104028: Reduce object creation while calling callback function from gc (pythongh-104030)pythongh-104036: Fix direct invocation of test_typing (python#104037)pythongh-102213: Optimize the performance of `__getattr__` (pythonGH-103761)pythongh-103895: Improve how invalid `Exception.__notes__` are displayed (python#103897)  Adjust expression from `==` to `!=` in alignment with the meaning of the paragraph. (pythonGH-104021)pythongh-88496: Fix IDLE test hang on macOS (python#104025)  Improve int test coverage (python#104024)pythongh-88773: Added teleport method to Turtle library (python#103974)pythongh-104015: Fix direct invocation of `test_dataclasses` (python#104017)pythongh-104012: Ensure test_calendar.CalendarTestCase.test_deprecation_warning consistently passes (python#104014)pythongh-103977: compile re expressions in platform.py only if required (python#103981)pythongh-98003: Inline call frames for CALL_FUNCTION_EX (pythonGH-98004)  Replace Netlify with Read the Docs build previews (python#103843)  Update name in acknowledgements and add mailmap (python#103696)pythongh-82054: allow test runner to split test_asyncio to execute in parallel by sharding. (python#103927)  Remove non-existing tools from Sundry skiplist (python#103991)pythongh-103793: Defer formatting task name (python#103767)pythongh-87092: change assembler to use instruction sequence instead of CFG (python#103933)pythongh-103636: issue warning for deprecated calendar constants (python#103833)  Various small fixes to dis docs (python#103923)  ...
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@carljmcarljmcarljm left review comments

@markshannonmarkshannonmarkshannon left review comments

@brandtbucherbrandtbucherbrandtbucher approved these changes

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

6 participants
@Fidget-Spinner@brandtbucher@markshannon@bedevere-bot@arhadthedev@carljm

[8]ページ先頭

©2009-2025 Movatter.jp