Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.4k
Description
Feature or enhancement
ImplementingPEP 703 will require adding additional fine grained locks and other synchronization mechanisms. For good performance, it's important that these locks be "lightweight" in the sense that they don't take up much space and don't require memory allocations to create. Additionally, it's important that these locks are fast in the common uncontended case, perform reasonably under contention, and avoid thread starvation.
Platform provided mutexes likepthread_mutex_t
are large (40 bytes on x86-64 Linux) and our current cross-platform wrappers ([1],[2],[3]) require additional memory allocations.
I'm proposing a lightweight mutex (PyMutex
) along with internal-only APIs used for building an efficientPyMutex
as well as other synchronization primitives. The design is based on WebKit'sWTF::Lock
andWTF::ParkingLot
, which is described in detail in theLocking in WebKit blog post. (The design has also been ported to Rust in theparking_lot
crate.)
Public API
The public API (inInclude/cpython
) would provide aPyMutex
that occupies one byte and can be zero-initialized:
typedefstructPyMutex {uint8_tstate; }PyMutex;voidPyMutex_Lock(PyMutex*m);voidPyMutex_Unlock(PyMutex*m);
I'm proposing makingPyMutex
public because it's useful in C extensions, such as NumPy, (as opposed to C++) where it can be a pain to wrap cross-platform synchronization primitives.
Internal APIs
The internal only API (inInclude/internal
) would provide APIs for buildingPyMutex
and other synchronization primitives. The main addition is a compare-and-wait primitive, like Linux'sfutex
or Window'sWaitOnAdress
.
int_PyParkingLot_Park(constvoid*address,constvoid*expected,size_taddress_size,_PyTime_ttimeout_ns,void*arg,intdetach)
The API closely matchesWaitOnAddress
but with two additions:arg
is an optional, arbitrary pointer passed to the wake-up thread anddetach
indicates whether to release the GIL (ordetach in--disable-gil
builds) while waiting. The additionalarg
pointer allows the locks to be only one byte (instead of at least pointer sized), since it allows passing additional (stack allocated) data between the waiting and the waking thread.
The wakeup API looks like:
// wake up all threads waiting on `address`void_PyParkingLot_UnparkAll(constvoid*address);// or wake up a single thread_PyParkingLot_Unpark(address,unpark, {// code here is executed after the thread to be woken up is identified but before we wake it upvoid*arg=unpark->arg;intmore_waiters=unpark->more_waiters; ...});
_PyParkingLot_Unpark
is currently a macro that takes a code block. ForPyMutex
we need to update the mutex bits after we identify the thread but before we actually wake it up.