Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
Description
(Maybe this is tentative enough that it still belongs in the faster-cpython/ideas tracker, but I hope we're close enough that we can hash it out here. CC@markshannon,@brandtbucher)
(This is a WIP until I have looked a bit deeper into this.)
First order of business is splitting some of the CALL specializations into multiple ops satisfying the uop requirement: either use oparg and no cache entries, or don't use oparg and use at most one cache entry. For example, one of the more important ones, CALL_PY_EXACT_ARGS, uses bothoparg (the number of arguments) and a cache entry (func_version). Splitting it into a guard and an action op is problematic: even discounting the possibility of encountering a bound method (i.e., assumingmethod isNULL), it contains the followingDEOPT calls:
// PyObject *callable = stack_pointer[-1-oparg]; DEOPT_IF(tstate->interp->eval_frame, CALL); int argcount = oparg; PyFunctionObject *func = (PyFunctionObject *)callable; DEOPT_IF(!PyFunction_Check(callable), CALL); PyFunctionObject *func = (PyFunctionObject *)callable; DEOPT_IF(func->func_version != func_version, CALL); PyCodeObject *code = (PyCodeObject *)func->func_code; DEOPT_IF(code->co_argcount != argcount, CALL); DEOPT_IF(!_PyThreadState_HasStackSpace(tstate, code->co_framesize), CALL);If we wanted to combine all this in a single guard op, that guard would require access to bothoparg (to dig outcallable) andfunc_version. The fundamental problem is that the callable, which needs to be prodded and poked for the guard to pass, is buried under the arguments, and we need to useoparg to know how deep it is buried.
What if we somehow reversed this so that the callable ison top of the stack, after the arguments? We could arrange for this by adding aCOPY n+1 opcode just before theCALL opcode (or its specializations). In fact, this could even be a blessing in disguise, since now we would no longer need to push aNULL before the callable to reserve space forself -- instead, if the callable is found to be a bound method, itsself can overwrite the original callable (below the arguments) and the function extracted from the bound method can overwrite the copy of the callableabove the arguments. This has the advantage of no longer needing to have a "pushNULL" bit in several other opcodes (theLOAD_GLOBAL andLOAD_ATTR families -- we'll have to review the logic inLOAD_ATTR a bit more to make sure this can work).
(Note that the key reason why the callable is buried below the arguments is a requirement about evaluation order in expressions -- the language reference requires that in the expressionF(X) whereF andX themselves are possibly complex expressions,F is evaluated beforeX.)
Comparing before and after, currently we have the following arrangement on the stack whenCALL n or any of its specializations is reached:
NULL callable arg[0] arg[1] ... arg[n-1]This is obtained by e.g.
PUSH_NULL LOAD_FAST callable <load n args> CALL nor
LOAD_GLOBAL (NULL + callable) <load n args> CALL nor
LOAD_ATTR (NULL|self + callable) <load n args> CALL nUnder my proposal the arrangement would change to
callable arg[0] arg[1] ... arg[n-1] callableand it would be obtained by
LOAD_FAST callable / LOAD_GLOBAL callable / LOAD_ATTR callable <load n args> COPY n+1 CALL nIt would (perhaps) even be permissible for the guard to overwrite both copies of the callable if a method is detected, since it would change from
self.func <n args> self.functo
self <n args> funcwhere we would be assured thatfunc has typePyFunctionObject *. (However, I think we ought to have separate specializations for the two cases, since the transformation would also require bumpingoparg.)
The runtime cost would be an extraCOPY instruction before eachCALL; however I think this might actually be simpler than the dynamic check for bound methods, at least when using copy-and-patch.
Another cost would be requiring extra specializations for some cases that currently dynamically decide between function and method; but again I think that with copy-and-patch that is probably worth it, given that we expect that dynamic check to always go the same way for a specific location.
Linked PRs
- gh-106581: Add 10 new opcodes by allowing
assert(kwnames == NULL)#106707 - gh-106581: Split
CALL_PY_EXACT_ARGSinto uops #107760 - gh-106581: Start projecting through calls #107793
- gh-106581: Project through calls #108067
- gh-106581: Fix two bugs in the code generator's copy optimization #108380
- gh-106581: Split CALL_BOUND_METHOD_EXACT_ARGS into uops #108462
- GH-106581: Fix instrumentation in tier 2 #108493
- gh-106581: Support multiple uops pushing new values #108895
- gh-106581: Honor 'always_exits' in write_components() #109338