I'm not attached to_ATOMIC_TYPES as the name for the set but 'atomic' matched how they're referred to incopy and I didn't have anything more suitable.

Issue:Dataclasses - Improve the performance of asdict/astuple for common types and default values #103000

DavidCEllis added6 commits

March 20, 2023 17:41

Special case 'atomic' types that are not deepcopied

0af5e1d

Special case dict_factory=dict

2c76ef8

Add comment explaining _ATOMIC_TYPES

c58017c

Merge branch 'python:main' into faster_dataclasses_serialize

8fdbdb2

Merge branch 'python:main' into faster_dataclasses_serialize

4730a13

Reorder atomic types and clarify intention

3a650a1

DavidCEllis requested a review fromericvsmith as acode owner

March 24, 2023 15:11

bedevere-bot mentioned this pull request

Mar 24, 2023

Dataclasses - Improve the performance of asdict/astuple for common types and default values#103000

Closed

bedevere-bot added the awaiting review label

Mar 24, 2023

AlexWaygood requested a review fromcarljm

March 24, 2023 16:15

AlexWaygood added type-feature

A feature request or enhancement

performance

Performance or resource usage

stdlib

Python modules in the Lib dir

3.12only security fixes labels

Mar 24, 2023

AlexWaygood reviewed

Mar 24, 2023

View reviewed changes

Lib/dataclasses.py OutdatedShow resolvedHide resolved

carljm reviewed

Mar 24, 2023

View reviewed changes

Copy link

Member

carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

One idea for a test that could be added here: we could add a test that dataclasses'_ATOMIC_TYPES is a subset of deepcopy's. This would protect against a future change to either of those lists breaking the semantic consistency that everything is deep-copied in asdict/astuple.

Lib/dataclasses.py OutdatedShow resolvedHide resolved

Copy link

Member

carljm commentedMar 24, 2023

I think this PR should include a news entry for the performance improvement.

carljm requested changes

Mar 24, 2023

View reviewed changes

Copy link

Member

carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Requesting the mentioned inline changes, the suggested test, the news entry, and the change to the simpler version that tests against_ATOMIC_TYPES just once at the top of the functions.

Thanks for finding and working on this optimization! Quite an impressive perf win either way.

bedevere-bot added awaiting changes and removed awaiting review labels

Mar 24, 2023

Copy link

bedevere-bot commentedMar 24, 2023

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phraseI have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

DavidCEllisand others added5 commits

March 24, 2023 18:09

Update Lib/dataclasses.py

b6d8a6f

Improve comment describing _ATOMIC_TYPESCo-authored-by: Carl Meyer <carl@oddbird.net>

Update Lib/dataclasses.py

bf320be

Remove comment referring to weakrefCo-authored-by: Carl Meyer <carl@oddbird.net>

Convert _ATOMIC_TYPES to frozenset

6280f2e

Correction - complex and bytes are not JSON serializable (by default …

7343fe6

…by the python stdlib json module).

Remove dict special case for now

b77c17e

Copy link

ContributorAuthor

DavidCEllis commentedMar 24, 2023

A few unresolved things:

dict_factory=dict optimisation will now be done in a separate follow-up PR.
If I make a test for _ATOMIC_TYPES is there a specific spot it whould go?
In the discussion thread Eric posted about gettingcopy to handle/provide the list/set of types
- I think that's out of scope for this change and concealswhy these types matter, but maybe worth looking into if it would be useful elsewhere
- This could always be changed to use such a set if it was provided and if that was the right choice

With regard to the 'simpler' version, while it is a cleaner change I'm not sure it's the right decision to give up on the performance improvement that we can getnow for the potential that changes to the interpreter might get some of it back in the future?

However I recognise I'm also not the one who has to maintain this and I do understand that it is a bit messier so I'll make the change if that's still requested. (The thumbs up on my comment explaining why I hadn't just done that has left me a bit confused on that front).

Copy link

Member

AlexWaygood commentedMar 24, 2023•
edited
Loading

If I make a test for _ATOMIC_TYPES is there a specific spot it whould go?

I'm sort-of -0 on importing anything private fromcopy.py, whether it's fordataclasses itself or adataclasses test. I don't like the idea of coupling the logic of the two modules like that.But I'm not adataclasses maintainer, so you should probably listen to Carl and Eric over me on this.

Anyway, the test, if you do add it, should go somewhere inLib/test/test_dataclasses.py.

Copy link

Member

AlexWaygood commentedMar 24, 2023•
edited
Loading

With regard to the 'simpler' version, while it is a cleaner change I'm not sure it's the right decision to give up on the performance improvement that we can getnow for the potential that changes to the interpreter might get some of it back in the future?
However I recognise I'm also not the one who has to maintain this and I do understand that it is a bit messier so I'll make the change if that's still requested. (The thumbs up on my comment explaining why I hadn't just done that has left me a bit confused on that front).

Another way of looking at that is: should we make the code significantly less readablenow for an optimisation based on an implementation detail of the current interpreter, that could change at any time? Once the code is checked in, it will be hard to justify changing it in the future for readability reasons, as we highly value code stability in the CPython repo -- we generally only make changes if they fix user-visible bugs or are a meaningful performance improvement. The readability loss could be permanent.

I would still favour the simpler code. I'll let@carljm explain the meaning of his thumbs-up ;)

Copy link

Member

carljm commentedMar 24, 2023

(The thumbs up on my comment explaining why I hadn't just done that has left me a bit confused on that front).

Sorry, my mistake! I've removed the thumbs-up to clarify :) Somehow the first time around I missed the second sentence of that comment, and only saw the first part saying that you had already done the benchmark. I have no explanation for how I managed to miss the second sentence, considering I was actually a bit confused why you were mentioning that you'd already done the benchmark without offering any further conclusions!

Another way of looking at that is: should we make the code significantly less readable now for an optimisation based on an implementation detail of the current interpreter, that could change at any time? Once the code is checked in, it will be hard to justify changing it in the future for readability reasons, as we highly value code stability in the CPython repo -- we generally only make changes if they fix user-visible bugs or are a meaningful performance improvement. The readability loss could be permanent.

I would still favour the simpler code.

I agree with this. There is always a balance between readability and performance, and we don't always prefer any detectable improvement in performance at any cost in readability and maintainability. We can get most of the wins here with much nicer code.

I'm sort-of -0 on importing anything private from copy.py, whether it's for dataclasses itself or a dataclasses test. I don't like the idea of coupling the logic of the two modules like that.

Logically, the coupling is introduced by the optimization itself; the test is just verifying the assumptions of the optimization aren't broken. That said, I think it's very unlikely thatdeepcopy would ever reduce the set of objects that it treats as atomic, and it's more likely they might rearrange the details of how that dispatch happens, so probably the test would be much more likely to fail for spurious and annoying reasons than for real ones. So I've changed my mind: forget the test :)

DavidCEllis added2 commits

March 24, 2023 20:25

Revert function call skip for _asdict_inner

0209f3c

Revert function call skip for _astuple_inner

986371c

AlexWaygood reviewed

Mar 24, 2023

View reviewed changes

Copy link

Member

AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks very good to me now, but still needs a news entry!

DavidCEllis changed the title~~gh-103000: Optimise dataclasses asdict/astuple for common types and the default dict_factory~~gh-103000: Optimise dataclasses asdict/astuple for common types

Mar 24, 2023

📜🤖 Added by blurb_it.

801af82

Copy link

ContributorAuthor

DavidCEllis commentedMar 24, 2023

I think that's all the comments covered?

I'm not sure I'll do a PR for the dict_factory special case, I think that was less significant than skipping the function call overhead? Logically now you'd probably write it as a comprehension (with the inline checks this was much more awkward) and I don't remember that getting as much improvement (perhaps when PEP709 lands).

Copy link

ContributorAuthor

DavidCEllis commentedMar 24, 2023

I have made the requested changes; please review again

bedevere-bot added awaiting change review and removed awaiting changes labels

Mar 24, 2023

Copy link

bedevere-bot commentedMar 24, 2023

Thanks for making the requested changes!

@carljm: please review the changes made to this pull request.

bedevere-bot requested a review fromcarljm

March 24, 2023 21:20

Copy link

Member

ericvsmith commentedMar 24, 2023

I'm basically okay with this, although as I mentioned on discuss.python.org I'm concerned about maintaining the list of types here and not incopy. This seems to be a generally useful optimization, why enable it only here? Should every module that wants to do something similar maintain it's own copy of_ATOMIC_TYPES?

Copy link

ContributorAuthor

DavidCEllis commentedMar 24, 2023

The way I look at it now is that it's notreally aboutdeepcopy.deepcopy ignoring these objects permits us to special case and return them, but if they weren'tuseful types it would be irrelevant.

The overlap with the types the stdlibjson can handle and the other potentially common types (complex, bytes) are what made this seem worthwhile to me. The remaining types are mostly there because I didn't have a compelling reasonnot to include them? Including everything at least fit the pattern of 'ignored by deepcopy' rather than being an arbitrary subset that I had chosen (and I definitely didn't thinkI should be the one choosing).

Copy link

Member

carljm commentedMar 27, 2023

I'm concerned about maintaining the list of types here and not incopy. This seems to be a generally useful optimization, why enable it only here? Should every module that wants to do something similar maintain it's own copy of_ATOMIC_TYPES?

I think this is mostly a dataclasses-specific optimization, where one component of it is avoiding an unnecessary call to deepcopy. A chunk of the win comes from short-circuiting the other type checks done by_as*_inner before it even tries deepcopy. Dataclasses, like deepcopy, is implementing a recursive algorithm that should descend only into "non-atomic" types, which is why this optimization can do "double duty" for dataclasses. I doubt it would make sense for other callers of deepcopy who aren't in a similar situation to bother with a pre-check for atomic types.

Given that I don't see other similar uses of deepcopy in the stdlib that would benefit from this, I'm not sure it's worth exposing a new public attribute for this on the copy module? Duplication seems OK here. But I'm also not opposed to making the change incopy to expose it.

carljm approved these changes

Mar 27, 2023

View reviewed changes

Copy link

Member

carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Code changes LGTM here. Will defer to@ericvsmith on whether he wantscopy to expose the atomic-types list instead.

bedevere-bot added awaiting merge and removed awaiting change review labels

Mar 27, 2023

Copy link

Member

carljm commentedApr 5, 2023

@ericvsmith considering my comment above, do you still feel that this should be updated to expose a list of atomic types as an attribute of thecopy module?

AlexWaygood reviewed

Apr 5, 2023

View reviewed changes

Misc/NEWS.d/next/Library/2023-03-24-20-49-48.gh-issue-103000.6eVNZI.rst OutdatedShow resolvedHide resolved

AlexWaygood approved these changes

Apr 5, 2023

View reviewed changes

Copy link

Member

AlexWaygood left a comment•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This LGTM, other than my small nit about the NEWS entry.

Given that I don't see other similar uses of deepcopy in the stdlib that would benefit from this, I'm not sure it's worth exposing a new public attribute for this on the copy module? Duplication seems OK here. But I'm also not opposed to making the change incopy to expose it.

I agree with@carljm here. I think a large part of this PR is about avoiding the inner loop in_asdict_inner and_astuple_inner, rather than any overhead in thecopy module. It's true that exposing the_ATOMIC_TYPES list in thecopy modulemight mean that other code could then add similar optimisations more easily. But I think that can be considered separately -- I'd advocate merging this now, and then considering whether to enhance thecopy module API in a followup issue.

Copy link

Member

ericvsmith commentedApr 10, 2023

@carljm : I'm okay with just leaving it in dataclasses, at least for starters. We can always move it in the future if needed. Feel free to merge this if you have the time and inclination.

Update Misc/NEWS.d/next/Library/2023-03-24-20-49-48.gh-issue-103000.6…

903b2fb

…eVNZI.rst

Copy link

Member

AlexWaygood commentedApr 10, 2023

The docs failures are a known issue, and are unrelated to this PR.

Thanks@DavidCEllis, this is a great speedup!

AlexWaygood merged commitd034590 intopython:main

Apr 10, 2023

bedevere-bot removed the awaiting merge label

Apr 10, 2023

DavidCEllis deleted the faster_dataclasses_serialize branch

April 10, 2023 22:07

warsaw pushed a commit to warsaw/cpython that referenced this pull request

Apr 11, 2023

pythongh-103000: Optimise dataclasses asdict/astuple for common types (…

dd85475

…python#103005)Co-authored-by: Carl Meyer <carl@oddbird.net>Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

aisk pushed a commit to aisk/cpython that referenced this pull request

Apr 18, 2023

pythongh-103000: Optimise dataclasses asdict/astuple for common types (…

96fdaa7

…python#103005)Co-authored-by: Carl Meyer <carl@oddbird.net>Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

eendebakpt mentioned this pull request

Jan 18, 2024

gh-114264: Optimize performance of copy.deepcopy by adding a fast path for atomic types#114266

Merged

Labels

3.12

only security fixes

performance

Performance or resource usage

stdlib

Python modules in the Lib dir

type-feature

A feature request or enhancement

5 participants

Movatterモバイル変換

Uh oh!

gh-103000: Optimise dataclasses asdict/astuple for common types#103005

gh-103000: Optimise dataclasses asdict/astuple for common types#103005

Uh oh!

Conversation

DavidCEllis commentedMar 24, 2023• edited by bedevere-botLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

carljm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

carljm commentedMar 24, 2023

Uh oh!

carljm left a comment

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commentedMar 24, 2023

Uh oh!

DavidCEllis commentedMar 24, 2023

Uh oh!

AlexWaygood commentedMar 24, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

AlexWaygood commentedMar 24, 2023• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

carljm commentedMar 24, 2023

Uh oh!

AlexWaygood left a comment

Choose a reason for hiding this comment

Uh oh!

DavidCEllis commentedMar 24, 2023

Uh oh!

DavidCEllis commentedMar 24, 2023

Uh oh!

bedevere-bot commentedMar 24, 2023

Uh oh!

ericvsmith commentedMar 24, 2023

Uh oh!

DavidCEllis commentedMar 24, 2023

Uh oh!

carljm commentedMar 27, 2023

Uh oh!

carljm left a comment

Choose a reason for hiding this comment

Uh oh!

carljm commentedApr 5, 2023

Uh oh!

Uh oh!

AlexWaygood left a comment• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericvsmith commentedApr 10, 2023

Uh oh!

AlexWaygood commentedApr 10, 2023

Uh oh!

Uh oh!

DavidCEllis commentedMar 24, 2023•
edited by bedevere-bot
Loading

AlexWaygood commentedMar 24, 2023•
edited
Loading

AlexWaygood commentedMar 24, 2023•
edited
Loading

AlexWaygood left a comment•
edited
Loading