
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2019-02-05 14:40 bypierreglaser, last changed2022-04-11 14:59 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| test_hook.py | pierreglaser,2019-02-05 14:40 | |||
| pickler_hook.patch | pierreglaser,2019-02-05 14:42 | |||
| pickler_hook.patch | pierreglaser,2019-03-11 14:51 | |||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 12499 | merged | pierreglaser,2019-03-22 21:51 | |
| PR 12588 | merged | pierreglaser,2019-03-27 14:39 | |
| Messages (7) | |||
|---|---|---|---|
| msg334870 -(view) | Author: Pierre Glaser (pierreglaser)* | Date: 2019-02-05 14:40 | |
Pickler objects provide a dispatch_table attribute, where the user can specifycustom saving functions depending on the object-to-be-saved type. However, forperformance purposes, this table is predated (in the C implementation only) bya hardcoded switch that will take care of the saving for many built-in types,without a lookup in the dispatch_table.Especially, it is not possible to define custom saving methods for functionsand classes, although the current default (save_global, that saves an objectusing its module attribute path) is likely to fail at pickling or unpicklingtime in many cases.The aforementioned failures exist on purpose in the standard library (as a wayto allow for the serialization of functions accessible from non-dynamic (*)modules only). However, there exist cases where serializing functions fromdynamic modules matter. These cases are currently handled thanks thecloudpickle module (https://github.com/cloudpipe/cloudpickle), that is used bymany distributed data-science frameworks such as pyspark, ray and dask. For thereasons explained above, cloudpickle's Pickler subclass derives from the pythonPickler class instead of its C class, which severely harms its performance.While prototyping with Antoine Pitrou, we came to the conclusion that a hookcould be added to the C Pickler class, in which an optional user-definedcallback would be invoked (if defined) when saving functions and classesinstead of the traditional save_global. Here is a patch so that we can havesomething concrete of which to discuss.(*) dynamic module are modules that cannot be imported by name as traditional python file backed module. Examples include the __main__ module that can be populated dynamically by running a script or by a, user writing code in a python shell / jupyter notebook. | |||
| msg334872 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2019-02-05 15:40 | |
FYI, I've removed the duplicate message :-) Also adding Serhiy as cc. | |||
| msg335404 -(view) | Author: Olivier Grisel (Olivier.Grisel)* | Date: 2019-02-13 10:54 | |
Adding such a hook would make it possible to reimplement cloudpickle.CloudPickler by deriving from the fast _pickle.Pickler class (instead of the slow pickle._Pickler as done currently). This would mean rewriting most of the CloudPickler method to only rely on a save_reduce-style design instead of directly calling pickle._Pickler.write and pickle._Pickler.save. This is tedious but doable.There is however a blocker with the current way closures are set: when we pickle a dynamically defined function (e.g. lambda, nested function or function in __main__), we currently use a direct call to memoize (https://github.com/cloudpipe/cloudpickle/blob/v0.7.0/cloudpickle/cloudpickle.py#L594) so as to be able to refer to the function itself in its own closure without causing an infinite loop in CloudPickler.dump. This also makes possible to pickle mutually recursive functions.The easiest way to avoid having to call memoize explicitly would be to be able to pass the full __closure__ attribute in the state dict of the reduce call. Indeed the save_reduce function calls memoize automatically after saving the reconstructor and its args but prior to saving the state:https://github.com/python/cpython/blob/v3.7.2/Modules/_pickle.c#L3903-L3931It would therefore be possible to pass a (state, slotstate) tuple with the closure in slotstate that so it could be reconstructed at unpickling time with a setattr:https://github.com/python/cpython/blob/v3.7.2/Modules/_pickle.c#L6258-L6272However, it is currently not possible to setattr __closure__ at the moment. We can only set individual closure cell contents (which is not compatible with the setattr state trick described above).To summarize, we need to implement the setter function for the __closure__ attribute of functions and methods to make it natural to reimplement the CloudPickler by inheriting from _pickle.Pickler using the hook described in this issue. | |||
| msg337673 -(view) | Author: Pierre Glaser (pierreglaser)* | Date: 2019-03-11 14:51 | |
Update:Instead of changing permission on some attributes of function objects (__globals__ and __closure__), we added an optional argument called state_setter to save_reduce. This expects a callable that will be saved inside the object's pickle string, and called when setting the state of the object instead of using the default way in load_build.This allows for external flexibility when setting custom pickling behavior of built-in types (in our use-cases: function and classes). I updated the patches so that anyone interested can take a look.Also, we tested the cloudpickle package against these patches (seehttps://github.com/cloudpipe/cloudpickle/pull/253). The tests run fine, and we observe a 10-30x speedup for real-life use-cases. We are starting to hit convergence on the implementation :) | |||
| msg341933 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2019-05-08 19:40 | |
New changeset65d98d0f53f558d7c799098da0abf376068c15fd by Antoine Pitrou (Pierre Glaser) in branch 'master':bpo-35900: Add a state_setter arg to save_reduce (GH-12588)https://github.com/python/cpython/commit/65d98d0f53f558d7c799098da0abf376068c15fd | |||
| msg341944 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2019-05-08 21:08 | |
New changeset289f1f80ee87a4baf4567a86b3425fb3bf73291d by Antoine Pitrou (Pierre Glaser) in branch 'master':bpo-35900: Enable custom reduction callback registration in _pickle (GH-12499)https://github.com/python/cpython/commit/289f1f80ee87a4baf4567a86b3425fb3bf73291d | |||
| msg341945 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2019-05-08 21:08 | |
Both PRs are now merged. Thank you Pierre! | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:10 | admin | set | github: 80081 |
| 2019-05-08 21:08:54 | pitrou | set | status: open -> closed resolution: fixed messages: +msg341945 stage: patch review -> resolved |
| 2019-05-08 21:08:30 | pitrou | set | messages: +msg341944 |
| 2019-05-08 19:40:28 | pitrou | set | messages: +msg341933 |
| 2019-03-27 14:39:40 | pierreglaser | set | pull_requests: +pull_request12530 |
| 2019-03-22 21:51:41 | pierreglaser | set | stage: patch review pull_requests: +pull_request12449 |
| 2019-03-11 14:51:39 | pierreglaser | set | files: +pickler_hook.patch messages: +msg337673 |
| 2019-02-13 10:54:45 | Olivier.Grisel | set | nosy: +Olivier.Grisel messages: +msg335404 |
| 2019-02-05 15:40:42 | pitrou | set | nosy: +serhiy.storchaka messages: +msg334872 |
| 2019-02-05 15:40:07 | pitrou | set | messages: -msg334871 |
| 2019-02-05 15:02:16 | SilentGhost | set | title: Add pickler hoor for the user to customize the serialization of user defined functions and types. -> Add pickler hook for the user to customize the serialization of user defined functions and types. |
| 2019-02-05 14:42:21 | pierreglaser | set | files: +pickler_hook.patch keywords: +patch messages: +msg334871 |
| 2019-02-05 14:40:22 | pierreglaser | create | |