Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
Description
Feature or enhancement
Theobject_lookup_special
microbenchmark inTools/ftscalingbench/ftscalingbench.py
currently doesn't scale well and is indicative of a broader FT performance issue that we should fix. The benchmark just callsround()
from multiple threads concurrently:
cpython/Tools/ftscalingbench/ftscalingbench.py
Lines 62 to 66 in56d0f9a
defobject_lookup_special(): | |
# round() uses `_PyObject_LookupSpecial()` internally. | |
N=1000*WORK_SCALE | |
foriinrange(N): | |
round(i/N) |
The issue is thatround()
calls_PyObject_LookupSpecial(number, &_Py_ID(__round__))
, which increments the reference count of the returned function (i.e., offloat.round
). The underlying function supports deferred reference counting, but_PyObject_LookupSpecial
and_PyType_LookupRef
do not take advantage of it.
For the FT build, we also need some extra support in order to safely use_PyStackRef
inbuiltin_round_impl
, because it's important that all_PyStackRef
s are visible to the GC. To support this, we can add a singly linked list of active_PyStackRef
s to_PyThreadStateImpl
.
Thestruct _PyCStackRef
implements this linked list pointer + a_PyStackRef
. In the GIL-enabled build, there's no linked list and it's essentially the same as_PyStackRef
.
// A stackref that can be stored in a regular C local variable and be visible// to the GC in the free threading build.// Used in combination with _PyThreadState_PushCStackRef().typedefstruct_PyCStackRef {_PyStackRefref;#ifdefPy_GIL_DISABLEDstruct_PyCStackRef*next;#endif}_PyCStackRef;struct_PyThreadStateImpl { ...// Linked list (stack) of active _PyCStackRefstruct_PyCStackRef*c_stack_refs; ...}staticinlinevoid_PyThreadState_PushCStackRef(PyThreadState*tstate,_PyCStackRef*ref) { ... }staticinlinevoid_PyThreadState_PopCStackRef(PyThreadState*tstate,_PyCStackRef*ref) { ... }