Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.4k
Description
Feature or enhancement
Improve the performance of asdict/astuple in common cases by making a shortcut for common types that are unaffected by deepcopy in the inner loop. Also special casing for the defaultdict_factory=dict to construct the dictionary directly.
The goal here is to improve performance in common cases without significantly impacting less common cases, while not changing the API or output in any way.
Pitch
In cases where a dataclass contains a lot of data of common python types (eg: bool/str/int/float) currently the inner loops forasdict andastuple require the values to be compared to check if they are dataclasses, namedtuples, lists, tuples, and then dictionaries before passing them todeepcopy. This proposes to special case and shortcut objects of types wheredeepcopy returns the object unchanged.
It is much faster for these cases to instead check for them at the first opportunity and shortcut their return, skipping the recursive call and all of the other comparisons. In the case where this is being used to prepare an object to serialize to JSON this can be quite significant as this covers most of the remaining types handled by the stdlibjson module.
Note: Anything that skips deepcopy with this alteration is already unchanged asdeepcopy(obj) is obj is always True for these types.
Currently when constructing thedict for a dataclass, a list of tuples is created and passed to thedict_factory constructor. In the case where thedict_factory constructor is the default -dict - it is faster to construct the dictionary directly.
Previous discussion
Discussed here with a few more details and earlier examples:https://discuss.python.org/t/dataclasses-make-asdict-astuple-faster-by-skipping-deepcopy-for-objects-where-deepcopy-obj-is-obj/24662
Code Details
Types to skip deepcopy
This is the current set of types to be checked for and shortcut returned, ordered in a way that I think makes more sense fordataclasses than the original ordering copied from thecopy module. These are known to be safe to skip as they are all sent to_deepcopy_atomic (which returns the original object) in thecopy module.
# Types for which deepcopy(obj) is known to return obj unmodified# Used to skip deepcopy in asdict and astuple for performance_ATOMIC_TYPES= {# Common JSON Serializable typestypes.NoneType,bool,int,float,complex,bytes,str,# Other types that are also unaffected by deepcopytypes.EllipsisType,types.NotImplementedType,types.CodeType,types.BuiltinFunctionType,types.FunctionType,type,range,property,# weakref.ref, # weakref is not currently imported by dataclasses directly}
Function changes
With that added the change is essentially replacing each instance of
_asdict_inner(v,dict_factory)
inside_asdict_inner, with
viftype(v)in_ATOMIC_TYPESelse_asdict_inner(v,dict_factory)
Instances of subclasses of these types are not guaranteed to havedeepcopy(obj) is obj so this checks specifically for instances of the base types.
Performance tests
Test file:https://gist.github.com/DavidCEllis/a2c2ceeeeda2d1ac509fb8877e5fb60d
Results on my development machine (not a perfectly stable test machine, but these differences are large enough).
Main
Current Main python branch:
Dataclasses asdict/astuple speed tests--------------------------------------Python v3.12.0alpha6GIT branch: mainTest Iterations: 10000List of Int case asdict: 5.80sTest Iterations: 1000List of Decimal case asdict: 0.65sTest Iterations: 1000000Basic types case asdict: 3.76sBasic types astuple: 3.48sTest Iterations: 100000Opaque types asdict: 2.15sOpaque types astuple: 2.11sTest Iterations: 100Mixed containers asdict: 3.66sMixed containers astuple: 3.28sModified
Dataclasses asdict/astuple speed tests--------------------------------------Python v3.12.0alpha6GIT branch: faster_dataclasses_serializeTest Iterations: 10000List of Int case asdict: 0.53sTest Iterations: 1000List of Decimal case asdict: 0.68sTest Iterations: 1000000Basic types case asdict: 1.33sBasic types astuple: 1.28sTest Iterations: 100000Opaque types asdict: 2.14sOpaque types astuple: 2.13sTest Iterations: 100Mixed containers asdict: 1.99sMixed containers astuple: 1.84s