Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.1k
gh-103000: Optimise dataclasses asdict/astuple for common types#103005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
gh-103000: Optimise dataclasses asdict/astuple for common types#103005
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
One idea for a test that could be added here: we could add a test that dataclasses'_ATOMIC_TYPES
is a subset of deepcopy's. This would protect against a future change to either of those lists breaking the semantic consistency that everything is deep-copied in asdict/astuple.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
I think this PR should include a news entry for the performance improvement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Requesting the mentioned inline changes, the suggested test, the news entry, and the change to the simpler version that tests against_ATOMIC_TYPES
just once at the top of the functions.
Thanks for finding and working on this optimization! Quite an impressive perf win either way.
bedevere-bot commentedMar 24, 2023
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Improve comment describing _ATOMIC_TYPESCo-authored-by: Carl Meyer <carl@oddbird.net>
Remove comment referring to weakrefCo-authored-by: Carl Meyer <carl@oddbird.net>
…by the python stdlib json module).
A few unresolved things:
With regard to the 'simpler' version, while it is a cleaner change I'm not sure it's the right decision to give up on the performance improvement that we can getnow for the potential that changes to the interpreter might get some of it back in the future? However I recognise I'm also not the one who has to maintain this and I do understand that it is a bit messier so I'll make the change if that's still requested. (The thumbs up on my comment explaining why I hadn't just done that has left me a bit confused on that front). |
AlexWaygood commentedMar 24, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I'm sort-of -0 on importing anything private from Anyway, the test, if you do add it, should go somewhere in |
AlexWaygood commentedMar 24, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Another way of looking at that is: should we make the code significantly less readablenow for an optimisation based on an implementation detail of the current interpreter, that could change at any time? Once the code is checked in, it will be hard to justify changing it in the future for readability reasons, as we highly value code stability in the CPython repo -- we generally only make changes if they fix user-visible bugs or are a meaningful performance improvement. The readability loss could be permanent. I would still favour the simpler code. I'll let@carljm explain the meaning of his thumbs-up ;) |
Sorry, my mistake! I've removed the thumbs-up to clarify :) Somehow the first time around I missed the second sentence of that comment, and only saw the first part saying that you had already done the benchmark. I have no explanation for how I managed to miss the second sentence, considering I was actually a bit confused why you were mentioning that you'd already done the benchmark without offering any further conclusions!
I agree with this. There is always a balance between readability and performance, and we don't always prefer any detectable improvement in performance at any cost in readability and maintainability. We can get most of the wins here with much nicer code.
Logically, the coupling is introduced by the optimization itself; the test is just verifying the assumptions of the optimization aren't broken. That said, I think it's very unlikely that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks very good to me now, but still needs a news entry!
I think that's all the comments covered? I'm not sure I'll do a PR for the dict_factory special case, I think that was less significant than skipping the function call overhead? Logically now you'd probably write it as a comprehension (with the inline checks this was much more awkward) and I don't remember that getting as much improvement (perhaps when PEP709 lands). |
I have made the requested changes; please review again |
bedevere-bot commentedMar 24, 2023
Thanks for making the requested changes! @carljm: please review the changes made to this pull request. |
I'm basically okay with this, although as I mentioned on discuss.python.org I'm concerned about maintaining the list of types here and not in |
The way I look at it now is that it's notreally about The overlap with the types the stdlib |
I think this is mostly a dataclasses-specific optimization, where one component of it is avoiding an unnecessary call to deepcopy. A chunk of the win comes from short-circuiting the other type checks done by Given that I don't see other similar uses of deepcopy in the stdlib that would benefit from this, I'm not sure it's worth exposing a new public attribute for this on the copy module? Duplication seems OK here. But I'm also not opposed to making the change in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Code changes LGTM here. Will defer to@ericvsmith on whether he wantscopy
to expose the atomic-types list instead.
@ericvsmith considering my comment above, do you still feel that this should be updated to expose a list of atomic types as an attribute of the |
Misc/NEWS.d/next/Library/2023-03-24-20-49-48.gh-issue-103000.6eVNZI.rst OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
AlexWaygood left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This LGTM, other than my small nit about the NEWS entry.
Given that I don't see other similar uses of deepcopy in the stdlib that would benefit from this, I'm not sure it's worth exposing a new public attribute for this on the copy module? Duplication seems OK here. But I'm also not opposed to making the change in
copy
to expose it.
I agree with@carljm here. I think a large part of this PR is about avoiding the inner loop in_asdict_inner
and_astuple_inner
, rather than any overhead in thecopy
module. It's true that exposing the_ATOMIC_TYPES
list in thecopy
modulemight mean that other code could then add similar optimisations more easily. But I think that can be considered separately -- I'd advocate merging this now, and then considering whether to enhance thecopy
module API in a followup issue.
@carljm : I'm okay with just leaving it in dataclasses, at least for starters. We can always move it in the future if needed. Feel free to merge this if you have the time and inclination. |
The docs failures are a known issue, and are unrelated to this PR. Thanks@DavidCEllis, this is a great speedup! |
…python#103005)Co-authored-by: Carl Meyer <carl@oddbird.net>Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
…python#103005)Co-authored-by: Carl Meyer <carl@oddbird.net>Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Uh oh!
There was an error while loading.Please reload this page.
PR for issue#103000
I'm not attached to
_ATOMIC_TYPES
as the name for the set but 'atomic' matched how they're referred to incopy
and I didn't have anything more suitable.