Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Call design for Tier 2 (uops) interpreter #106581

Closed
Labels
interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usage
@gvanrossum

Description

@gvanrossum

(Maybe this is tentative enough that it still belongs in the faster-cpython/ideas tracker, but I hope we're close enough that we can hash it out here. CC@markshannon,@brandtbucher)

(This is a WIP until I have looked a bit deeper into this.)

First order of business is splitting some of the CALL specializations into multiple ops satisfying the uop requirement: either use oparg and no cache entries, or don't use oparg and use at most one cache entry. For example, one of the more important ones, CALL_PY_EXACT_ARGS, uses bothoparg (the number of arguments) and a cache entry (func_version). Splitting it into a guard and an action op is problematic: even discounting the possibility of encountering a bound method (i.e., assumingmethod isNULL), it contains the followingDEOPT calls:

            // PyObject *callable = stack_pointer[-1-oparg];            DEOPT_IF(tstate->interp->eval_frame, CALL);            int argcount = oparg;            PyFunctionObject *func = (PyFunctionObject *)callable;            DEOPT_IF(!PyFunction_Check(callable), CALL);            PyFunctionObject *func = (PyFunctionObject *)callable;            DEOPT_IF(func->func_version != func_version, CALL);            PyCodeObject *code = (PyCodeObject *)func->func_code;            DEOPT_IF(code->co_argcount != argcount, CALL);            DEOPT_IF(!_PyThreadState_HasStackSpace(tstate, code->co_framesize), CALL);

If we wanted to combine all this in a single guard op, that guard would require access to bothoparg (to dig outcallable) andfunc_version. The fundamental problem is that the callable, which needs to be prodded and poked for the guard to pass, is buried under the arguments, and we need to useoparg to know how deep it is buried.

What if we somehow reversed this so that the callable ison top of the stack, after the arguments? We could arrange for this by adding aCOPY n+1 opcode just before theCALL opcode (or its specializations). In fact, this could even be a blessing in disguise, since now we would no longer need to push aNULL before the callable to reserve space forself -- instead, if the callable is found to be a bound method, itsself can overwrite the original callable (below the arguments) and the function extracted from the bound method can overwrite the copy of the callableabove the arguments. This has the advantage of no longer needing to have a "pushNULL" bit in several other opcodes (theLOAD_GLOBAL andLOAD_ATTR families -- we'll have to review the logic inLOAD_ATTR a bit more to make sure this can work).

(Note that the key reason why the callable is buried below the arguments is a requirement about evaluation order in expressions -- the language reference requires that in the expressionF(X) whereF andX themselves are possibly complex expressions,F is evaluated beforeX.)

Comparing before and after, currently we have the following arrangement on the stack whenCALL n or any of its specializations is reached:

    NULL    callable    arg[0]    arg[1]    ...    arg[n-1]

This is obtained by e.g.

    PUSH_NULL    LOAD_FAST callable    <load n args>    CALL n

or

    LOAD_GLOBAL (NULL + callable)    <load n args>    CALL n

or

    LOAD_ATTR (NULL|self + callable)    <load n args>    CALL n

Under my proposal the arrangement would change to

    callable    arg[0]    arg[1]    ...    arg[n-1]    callable

and it would be obtained by

    LOAD_FAST callable  /  LOAD_GLOBAL callable  /  LOAD_ATTR callable    <load n args>    COPY n+1    CALL n

It would (perhaps) even be permissible for the guard to overwrite both copies of the callable if a method is detected, since it would change from

    self.func    <n args>    self.func

to

    self    <n args>    func

where we would be assured thatfunc has typePyFunctionObject *. (However, I think we ought to have separate specializations for the two cases, since the transformation would also require bumpingoparg.)

The runtime cost would be an extraCOPY instruction before eachCALL; however I think this might actually be simpler than the dynamic check for bound methods, at least when using copy-and-patch.

Another cost would be requiring extra specializations for some cases that currently dynamically decide between function and method; but again I think that with copy-and-patch that is probably worth it, given that we expect that dynamic check to always go the same way for a specific location.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usage

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp