Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork11.9k
Description
PEP 574 (scheduled for Python 3.8) introduces pickle protocol 5 with support for no-copy pickling of large mutable buffers.
I made a small proof-of-concept benchmark script using@pitrou'spickle5 backport of his draft implementation of PEP 547.
See:https://gist.github.com/ogrisel/a2b0e5ae4987a398caa7f9277cb3b90a
The meat lies in the following reducer:
frompickle5importPickleBufferdef_array_from_buffer(buffer,dtype,shape):returnnp.frombuffer(buffer,dtype=dtype).reshape(shape)defreduce_ndarray_pickle5(a):# This reducer assumes protocol 5 as currently there is no way to register# protocol-aware reduce function in the global copyreg dispatch table.ifnota.dtype.hasobjectanda.flags.c_contiguous:# No-copy pickling for C-contiguous arrays and protocol 5return_array_from_buffer, (PickleBuffer(a),a.dtype,a.shape),Noneelse:# Fall-back to generic methodreturna.__reduce__()
This works as expected (no extra copy when dumping and loading) and also fixes the in-memory speed overhead reported in by@mrocklin in#7544.
To get this in numpy, we would need to make a protocol-aware reduce function that is, havendarray implement a__reduce_ex__ method that accepts aprotocol argument instead of the existingbytes-based implementation fromarray_reduce inhttps://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/methods.c#L1577. This bytes-based implementation should probably be kept as a fallback whenprotocol < 5.