Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-103000: Optimise dataclasses asdict/astuple for common types#103005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged

Conversation

DavidCEllis
Copy link
Contributor

@DavidCEllisDavidCEllis commentedMar 24, 2023
edited by bedevere-bot
Loading

PR for issue#103000

I'm not attached to_ATOMIC_TYPES as the name for the set but 'atomic' matched how they're referred to incopy and I didn't have anything more suitable.

bluss reacted with rocket emoji
@AlexWaygoodAlexWaygood added type-featureA feature request or enhancement performancePerformance or resource usage stdlibPython modules in the Lib dir 3.12only security fixes labelsMar 24, 2023
Copy link
Member

@carljmcarljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

One idea for a test that could be added here: we could add a test that dataclasses'_ATOMIC_TYPES is a subset of deepcopy's. This would protect against a future change to either of those lists breaking the semantic consistency that everything is deep-copied in asdict/astuple.

@carljm
Copy link
Member

I think this PR should include a news entry for the performance improvement.

AlexWaygood reacted with thumbs up emoji

Copy link
Member

@carljmcarljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Requesting the mentioned inline changes, the suggested test, the news entry, and the change to the simpler version that tests against_ATOMIC_TYPES just once at the top of the functions.

Thanks for finding and working on this optimization! Quite an impressive perf win either way.

AlexWaygood reacted with thumbs up emoji
@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phraseI have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

DavidCEllisand others added5 commitsMarch 24, 2023 18:09
Improve comment describing _ATOMIC_TYPESCo-authored-by: Carl Meyer <carl@oddbird.net>
Remove comment referring to weakrefCo-authored-by: Carl Meyer <carl@oddbird.net>
@DavidCEllis
Copy link
ContributorAuthor

A few unresolved things:

  • dict_factory=dict optimisation will now be done in a separate follow-up PR.
  • If I make a test for _ATOMIC_TYPES is there a specific spot it whould go?
  • In the discussion thread Eric posted about gettingcopy to handle/provide the list/set of types
    • I think that's out of scope for this change and concealswhy these types matter, but maybe worth looking into if it would be useful elsewhere
    • This could always be changed to use such a set if it was provided and if that was the right choice

With regard to the 'simpler' version, while it is a cleaner change I'm not sure it's the right decision to give up on the performance improvement that we can getnow for the potential that changes to the interpreter might get some of it back in the future?

However I recognise I'm also not the one who has to maintain this and I do understand that it is a bit messier so I'll make the change if that's still requested. (The thumbs up on my comment explaining why I hadn't just done that has left me a bit confused on that front).

@AlexWaygood
Copy link
Member

AlexWaygood commentedMar 24, 2023
edited
Loading

If I make a test for _ATOMIC_TYPES is there a specific spot it whould go?

I'm sort-of -0 on importing anything private fromcopy.py, whether it's fordataclasses itself or adataclasses test. I don't like the idea of coupling the logic of the two modules like that.But I'm not adataclasses maintainer, so you should probably listen to Carl and Eric over me on this.

Anyway, the test, if you do add it, should go somewhere inLib/test/test_dataclasses.py.

@AlexWaygood
Copy link
Member

AlexWaygood commentedMar 24, 2023
edited
Loading

With regard to the 'simpler' version, while it is a cleaner change I'm not sure it's the right decision to give up on the performance improvement that we can getnow for the potential that changes to the interpreter might get some of it back in the future?

However I recognise I'm also not the one who has to maintain this and I do understand that it is a bit messier so I'll make the change if that's still requested. (The thumbs up on my comment explaining why I hadn't just done that has left me a bit confused on that front).

Another way of looking at that is: should we make the code significantly less readablenow for an optimisation based on an implementation detail of the current interpreter, that could change at any time? Once the code is checked in, it will be hard to justify changing it in the future for readability reasons, as we highly value code stability in the CPython repo -- we generally only make changes if they fix user-visible bugs or are a meaningful performance improvement. The readability loss could be permanent.

I would still favour the simpler code. I'll let@carljm explain the meaning of his thumbs-up ;)

@carljm
Copy link
Member

(The thumbs up on my comment explaining why I hadn't just done that has left me a bit confused on that front).

Sorry, my mistake! I've removed the thumbs-up to clarify :) Somehow the first time around I missed the second sentence of that comment, and only saw the first part saying that you had already done the benchmark. I have no explanation for how I managed to miss the second sentence, considering I was actually a bit confused why you were mentioning that you'd already done the benchmark without offering any further conclusions!

Another way of looking at that is: should we make the code significantly less readable now for an optimisation based on an implementation detail of the current interpreter, that could change at any time? Once the code is checked in, it will be hard to justify changing it in the future for readability reasons, as we highly value code stability in the CPython repo -- we generally only make changes if they fix user-visible bugs or are a meaningful performance improvement. The readability loss could be permanent.

I would still favour the simpler code.

I agree with this. There is always a balance between readability and performance, and we don't always prefer any detectable improvement in performance at any cost in readability and maintainability. We can get most of the wins here with much nicer code.

I'm sort-of -0 on importing anything private from copy.py, whether it's for dataclasses itself or a dataclasses test. I don't like the idea of coupling the logic of the two modules like that.

Logically, the coupling is introduced by the optimization itself; the test is just verifying the assumptions of the optimization aren't broken. That said, I think it's very unlikely thatdeepcopy would ever reduce the set of objects that it treats as atomic, and it's more likely they might rearrange the details of how that dispatch happens, so probably the test would be much more likely to fail for spurious and annoying reasons than for real ones. So I've changed my mind: forget the test :)

AlexWaygood reacted with thumbs up emoji

Copy link
Member

@AlexWaygoodAlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks very good to me now, but still needs a news entry!

@DavidCEllisDavidCEllis changed the titlegh-103000: Optimise dataclasses asdict/astuple for common types and the default dict_factorygh-103000: Optimise dataclasses asdict/astuple for common typesMar 24, 2023
@DavidCEllis
Copy link
ContributorAuthor

I think that's all the comments covered?

I'm not sure I'll do a PR for the dict_factory special case, I think that was less significant than skipping the function call overhead? Logically now you'd probably write it as a comprehension (with the inline checks this was much more awkward) and I don't remember that getting as much improvement (perhaps when PEP709 lands).

@DavidCEllis
Copy link
ContributorAuthor

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@carljm: please review the changes made to this pull request.

@ericvsmith
Copy link
Member

I'm basically okay with this, although as I mentioned on discuss.python.org I'm concerned about maintaining the list of types here and not incopy. This seems to be a generally useful optimization, why enable it only here? Should every module that wants to do something similar maintain it's own copy of_ATOMIC_TYPES?

@DavidCEllis
Copy link
ContributorAuthor

The way I look at it now is that it's notreally aboutdeepcopy.deepcopy ignoring these objects permits us to special case and return them, but if they weren'tuseful types it would be irrelevant.

The overlap with the types the stdlibjson can handle and the other potentially common types (complex, bytes) are what made this seem worthwhile to me. The remaining types are mostly there because I didn't have a compelling reasonnot to include them? Including everything at least fit the pattern of 'ignored by deepcopy' rather than being an arbitrary subset that I had chosen (and I definitely didn't thinkI should be the one choosing).

AlexWaygood and carljm reacted with thumbs up emoji

@carljm
Copy link
Member

I'm concerned about maintaining the list of types here and not incopy. This seems to be a generally useful optimization, why enable it only here? Should every module that wants to do something similar maintain it's own copy of_ATOMIC_TYPES?

I think this is mostly a dataclasses-specific optimization, where one component of it is avoiding an unnecessary call to deepcopy. A chunk of the win comes from short-circuiting the other type checks done by_as*_inner before it even tries deepcopy. Dataclasses, like deepcopy, is implementing a recursive algorithm that should descend only into "non-atomic" types, which is why this optimization can do "double duty" for dataclasses. I doubt it would make sense for other callers of deepcopy who aren't in a similar situation to bother with a pre-check for atomic types.

Given that I don't see other similar uses of deepcopy in the stdlib that would benefit from this, I'm not sure it's worth exposing a new public attribute for this on the copy module? Duplication seems OK here. But I'm also not opposed to making the change incopy to expose it.

AlexWaygood reacted with thumbs up emoji

Copy link
Member

@carljmcarljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Code changes LGTM here. Will defer to@ericvsmith on whether he wantscopy to expose the atomic-types list instead.

@carljm
Copy link
Member

@ericvsmith considering my comment above, do you still feel that this should be updated to expose a list of atomic types as an attribute of thecopy module?

Copy link
Member

@AlexWaygoodAlexWaygood left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This LGTM, other than my small nit about the NEWS entry.

Given that I don't see other similar uses of deepcopy in the stdlib that would benefit from this, I'm not sure it's worth exposing a new public attribute for this on the copy module? Duplication seems OK here. But I'm also not opposed to making the change incopy to expose it.

I agree with@carljm here. I think a large part of this PR is about avoiding the inner loop in_asdict_inner and_astuple_inner, rather than any overhead in thecopy module. It's true that exposing the_ATOMIC_TYPES list in thecopy modulemight mean that other code could then add similar optimisations more easily. But I think that can be considered separately -- I'd advocate merging this now, and then considering whether to enhance thecopy module API in a followup issue.

@ericvsmith
Copy link
Member

@carljm : I'm okay with just leaving it in dataclasses, at least for starters. We can always move it in the future if needed. Feel free to merge this if you have the time and inclination.

carljm and AlexWaygood reacted with thumbs up emoji

@AlexWaygood
Copy link
Member

The docs failures are a known issue, and are unrelated to this PR.

Thanks@DavidCEllis, this is a great speedup!

DavidCEllis reacted with thumbs up emoji

@AlexWaygoodAlexWaygood merged commitd034590 intopython:mainApr 10, 2023
@DavidCEllisDavidCEllis deleted the faster_dataclasses_serialize branchApril 10, 2023 22:07
warsaw pushed a commit to warsaw/cpython that referenced this pull requestApr 11, 2023
…python#103005)Co-authored-by: Carl Meyer <carl@oddbird.net>Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
aisk pushed a commit to aisk/cpython that referenced this pull requestApr 18, 2023
…python#103005)Co-authored-by: Carl Meyer <carl@oddbird.net>Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@carljmcarljmcarljm approved these changes

@AlexWaygoodAlexWaygoodAlexWaygood approved these changes

@ericvsmithericvsmithAwaiting requested review from ericvsmithericvsmith is a code owner

Assignees
No one assigned
Labels
3.12only security fixesperformancePerformance or resource usagestdlibPython modules in the Lib dirtype-featureA feature request or enhancement
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

5 participants
@DavidCEllis@carljm@bedevere-bot@AlexWaygood@ericvsmith

[8]ページ先頭

©2009-2025 Movatter.jp