Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-114264: Optimize performance of copy.deepcopy by adding a fast path for atomic types#114266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged

Conversation

eendebakpt
Copy link
Contributor

@eendebakpteendebakpt commentedJan 18, 2024
edited by bedevere-appbot
Loading

By adding a fast path for atomic types we eliminate the overhead of checking thememo argument of deepcopy and the overhead of calling the dispatch method.

Benchmark:

deepcopy dict: Mean +- std dev: [main] 6.88 us +- 0.08 us -> [pr] 4.65 us +- 0.06 us: 1.48x fasterdeepcopy dataclass: Mean +- std dev: [main] 6.64 us +- 0.11 us -> [pr] 5.23 us +- 0.07 us: 1.27x fasterdeepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [pr] 924 ns +- 33 ns: 1.16x fasterGeometric mean: 1.30x faster

Benchmark script:

import pyperfrunner = pyperf.Runner()setup="""import copya={'list': [1,2,3,43], 't': (1,2,3), 'str': 'hello', 'subdict': {'a': True}}from dataclasses import dataclass@dataclassclass A:    a : list    b : str    c : bool    dc=A([1,2,3], 'hello', True)small_tuple = (1, )"""runner.timeit(name=f"deepcopy dict", stmt=f"b=copy.deepcopy(a)", setup=setup)runner.timeit(name=f"deepcopy dataclass", stmt=f"b=copy.deepcopy(dc)", setup=setup)runner.timeit(name=f"deepcopy small tuple", stmt=f"b=copy.deepcopy(small_tuple)", setup=setup)

The approach is similar to the one used in#103005

Comment on lines +124 to +128
cls = type(x)

if cls in _atomic_types:
return x

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Adding a fast path here would cause a potential behavioral change ifmemo is involved. This should be mentioned in NEWS.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The safer way might be simply keep the original behavior - put the fast path after memo. This is a user observable change and it might matter. For example, if the user is using the memo dict to keep track of all the objects copied.

It would also be interested to see the benchmark if we keep the memo as it is - how much performance gain is from not updating memo?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Putting the fast path after the memo results in the following (main vs. the variation):

deepcopy dict: Mean +- std dev: [main] 6.88 us +- 0.08 us -> [pr_v2] 6.06 us +- 0.08 us: 1.13x fasterdeepcopy dataclass: Mean +- std dev: [main] 6.64 us +- 0.11 us -> [pr_v2] 6.43 us +- 0.17 us: 1.03x fasterdeepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [pr_v2] 1.01 us +- 0.02 us: 1.06x fasterGeometric mean: 1.08x faster

Still worthwhile, but a smaller improvement.

I am trying to find a case where the behavior changes (so I can also add a test). But the atomic types are not added to thememo:

...    # If is its own copy, don't memoize.    if y is not x:        memo[d] = y        _keep_alive(x, memo) # Make sure x lives at least as long as d...

The only case I could think of where behaviour changes is when users supply their ownmemo, and then the behavior change (different id for the same string) could be considered an implementation details:

import copy# large str, so it is not interneds='sds' * 12312312s2='sds' * 12312312print(id(s), id(s2)) # different id'smemo= {id(s2): s}t=copy.deepcopy(s2, memo)print(id(t), id(s), id(s)==id(t), id(s2)) # with this PR id(s) and id(t) are not equal, although s and t are equal as strings

Are there any cases where behavior changes that I am missing?

serhiy-storchaka reacted with thumbs up emoji

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wonder how the times will change if useif copier is _deepcopy_atomic instead ofif cls in _atomic_types? You can also try to use different special value for example... instead of_deepcopy_atomic to avoid lookup in globals.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@serhiy-storchaka That is an interesting suggestion. A microbenchmark shows getting the copier is not much more expensive than thecls in _atomic_types check:

%timeit cls in _atomic_types43.7 ns ± 0.0706 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)%timeit _deepcopy_dispatch.get(cls)71.5 ns ± 0.0873 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Using a special value for the copier can be done after the memo check. This is quite close to current main and has performance

deepcopy dict: Mean +- std dev: [main] 6.88 us +- 0.08 us -> [v3] 6.36 us +- 0.11 us: 1.08x fasterdeepcopy dataclass: Mean +- std dev: [main] 6.64 us +- 0.11 us -> [v3] 6.26 us +- 0.05 us: 1.06x fasterdeepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [v3] 1.02 us +- 0.01 us: 1.05x fasterGeometric mean: 1.06x faster

(implementation is:main...eendebakpt:deepcopy_atomic_types_v3)

Using the same approach before the memo check has performance:

deepcopy dict: Mean +- std dev: [main] 6.88 us +- 0.08 us -> [v4] 4.96 us +- 0.08 us: 1.39x fasterdeepcopy dataclass: Mean +- std dev: [main] 6.64 us +- 0.11 us -> [v4] 5.38 us +- 0.06 us: 1.23x fasterdeepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [v4] 931 ns +- 10 ns: 1.15x fasterGeometric mean: 1.25x faster

(implementation:main...eendebakpt:deepcopy_atomic_types_v4)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

But the atomic types are not added to the memo:

Ah you are right, I missed that. If that, then the impact is much smaller than I thought and I think it's okay to skip the memo part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I am okay with both solutions, provided thememo change is mentioned (if there is one). Users can use this function in surprising ways, and it's a de facto public API.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you for satisfying my curiosity. I see that there is a non-trivial benefit from using_atomic_types.

Could you please make benchmarks for a collection, containing a large number of non-atomic identical objects, e.g.[[1]]*100, for different variants of the code? I expect that some variants can have regression, but if it is not great, we can accept this.

I also wonder whether the same approach (with an_immutable_types set) should be used incopy.copy(). Even if it does not usememo, there may be some benefit, and it could be better for uniformity of the code.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A more extensive benchmark script:

import pyperfrunner = pyperf.Runner()setup="""import copya={'list': [1,2,3,43], 't': (1,2,3), 'str': 'hello', 'subdict': {'a': True}}from dataclasses import dataclass@dataclassclass A:    a : list    b : str    c : bool    dc=A([1,2,3], 'hello', True)@dataclassclass A:    a : int    dc_small = A(123)small_tuple = (1, )l = {'hi': 100}repeating_atomic = [ [1] * 100]repeating = [dc_small] * 100"""runner.timeit(name="deepcopy dict", stmt=f"b=copy.deepcopy(a)", setup=setup)runner.timeit(name="deepcopy dataclass", stmt=f"b=copy.deepcopy(dc)", setup=setup)runner.timeit(name="deepcopy small dataclass", stmt=f"b=copy.deepcopy(dc_small)", setup=setup)runner.timeit(name="deepcopy small tuple", stmt=f"b=copy.deepcopy(small_tuple)", setup=setup)runner.timeit(name="deepcopy repeating", stmt=f"b=copy.deepcopy(repeating)", setup=setup)runner.timeit(name="deepcopy repeating_atomic", stmt=f"b=copy.deepcopy(repeating_atomic)", setup=setup)

Comparison of the three alternative versions to main. This PR:

deepcopy dict: Mean +- std dev: [main] 6.82 us +- 0.07 us -> [pr] 4.68 us +- 0.08 us: 1.46x fasterdeepcopy dataclass: Mean +- std dev: [main] 6.59 us +- 0.07 us -> [pr] 5.23 us +- 0.07 us: 1.26x fasterdeepcopy small dataclass: Mean +- std dev: [main] 4.30 us +- 0.07 us -> [pr] 3.88 us +- 0.08 us: 1.11x fasterdeepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [pr] 925 ns +- 12 ns: 1.16x fasterdeepcopy repeating: Mean +- std dev: [main] 21.8 us +- 0.2 us -> [pr] 24.9 us +- 0.4 us: 1.15x slowerdeepcopy repeating_atomic: Mean +- std dev: [main] 24.6 us +- 0.5 us -> [pr] 10.1 us +- 0.2 us: 2.44x fasterGeometric mean: 1.31x faster

main...eendebakpt:deepcopy_atomic_types_v3

deepcopy dict: Mean +- std dev: [main] 6.82 us +- 0.07 us -> [v3] 6.34 us +- 0.22 us: 1.08x fasterdeepcopy dataclass: Mean +- std dev: [main] 6.59 us +- 0.07 us -> [v3] 6.21 us +- 0.11 us: 1.06x fasterdeepcopy small dataclass: Mean +- std dev: [main] 4.30 us +- 0.07 us -> [v3] 4.16 us +- 0.04 us: 1.03x fasterdeepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [v3] 1.03 us +- 0.02 us: 1.03x fasterdeepcopy repeating: Mean +- std dev: [main] 21.8 us +- 0.2 us -> [v3] 22.0 us +- 0.6 us: 1.01x slowerdeepcopy repeating_atomic: Mean +- std dev: [main] 24.6 us +- 0.5 us -> [v3] 21.2 us +- 0.6 us: 1.16x fasterGeometric mean: 1.06x faster

main...eendebakpt:deepcopy_atomic_types_v4

deepcopy dict: Mean +- std dev: [main] 6.82 us +- 0.07 us -> [v4] 4.93 us +- 0.05 us: 1.38x fasterdeepcopy dataclass: Mean +- std dev: [main] 6.59 us +- 0.07 us -> [v4] 5.35 us +- 0.06 us: 1.23x fasterdeepcopy small dataclass: Mean +- std dev: [main] 4.30 us +- 0.07 us -> [v4] 3.89 us +- 0.05 us: 1.11x fasterdeepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [v4] 938 ns +- 25 ns: 1.14x fasterdeepcopy repeating: Mean +- std dev: [main] 21.8 us +- 0.2 us -> [v4] 26.4 us +- 0.5 us: 1.21x slowerdeepcopy repeating_atomic: Mean +- std dev: [main] 24.6 us +- 0.5 us -> [v4] 12.4 us +- 0.4 us: 1.99x fasterGeometric mean: 1.23x faster

Some notes:

  • In this PR we avoid the cost of looking into the memo for atomic types which provides the main speedup. This comes at a performance degradation for objects containing many identical (sameid) values. Based on the numbers above there is no peformance reason to pick version v4. For clarify or uniformity of the code it would not hurt a lot though to replace the pr with v4. I think one could construct benchmarks where v3 is faster than this pr (thinking about non-atomic types which have a custom__copy__ and recursive calls to deepcopy such as numpy arrays), but I expect differences to be small. Version v3 has the same performance on the "deepcopy repeating" as main (the 1.01x slower is a random fluctuation I believe, on average it will be 1.00x) and a small performance improvement for some other cases.

  • I also considered aligning the implementations ofcopy.copy andcopy.deepcopy (for uniformity of the code), but decided against this initially. My line of thought:

    • There is nomemo that can be skipped (which was the main performance benefit)
    • Calls ofcopy.copy on atomic types should be relative rare (and cheap anyway), so there is less to gain.
    • For calls on container types (likelist,dict orset) there are no recursive calls to the components. This is in contrast withdeepcopy where are call on a dataclass can recurse into atomic-types.
  • I have not updated the news entry yet with a description of the behaviour changes, because I think the behavior changes are not really something one can notice with normal use ofdeepcopy.
    i) The first behavior change is that the number of invocations ofmemo.get is less. Butmemo.get is a read-only operation (on normal dicts), so this is not something visible from the public interface.
    ii) In the example given above the memo passed ismemo= {id(s2): s}. This memo is not really avalid memo, since we require for each key-value pair in the memoid(value)==key, which is not true in the example.
    If desired I can add this to the news entry though.

  • The results above are only micro benchmarks and it will depend on the application which version will perform best. Applications where I have found the deepcopy to take a significant amount of time are lmfit, qiskit and quantify-scheduler, but I cannot run those with current main (I could with 3.12 though).

  • There is another PR improving the speed of deepcopy (gh-72793: C implementation of parts of copy.deepcopy #91610). That approach converts the python implementation to C but is more complex and will take time to review (and might not be accepted).

Copy link
Member

@gaogaotiantiangaogaotiantian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think this is pretty straightforward and we can get some easy gain.

@gaogaotiantian
Copy link
Member

I think the current solution is simple and effective. The benchmark shows good results. I did not see a serious downside. All the exsiting tests passed. You'll need a core dev's approval to merge it in though.

eendebakpt reacted with thumbs up emoji

Copy link
Contributor

@erlend-aaslanderlend-aasland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM, but let's await Serhiy's thumbs-up for this, since he had remarks earlier.

@eendebakpt
Copy link
ContributorAuthor

The_nil argument ofdeepcopy can be removed with this change (None works as a sentinel since the only keys for the memo are of typeint):

diff --git a/Lib/copy.py b/Lib/copy.pyindex 7a1907d754..4566e47a0e 100644--- a/Lib/copy.py+++ b/Lib/copy.py@@ -115,7 +115,7 @@ def _copy_immutable(x): del d, t-def deepcopy(x, memo=None, _nil=[]):+def deepcopy(x, memo=None):     """Deep copy operation on arbitrary Python objects.     See the module's __doc__ string for more info.@@ -130,8 +130,8 @@ def deepcopy(x, memo=None, _nil=[]):     if memo is None:         memo = {}     else:-        y = memo.get(d, _nil)-        if y is not _nil:+        y = memo.get(d, None)+        if y is not None:             return y     copier = _deepcopy_dispatch.get(cls)

Performance should be a bit better (since there is no assignment to_nil), but the impact is small (to the extend that it is hard to measure). It does remove the_nil argument which is nice, but would add a little bit of churn to the PR. Unless everyone is +1 on this change I will leave it out.

@erlend-aasland
Copy link
Contributor

@serhiy-storchaka, do you have time for a last look at this? It would be nice with your thumbs-up before I proceed.

Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM.

@serhiy-storchakaserhiy-storchaka merged commit9d66042 intopython:mainJun 7, 2024
33 checks passed
noahbkim pushed a commit to hudson-trading/cpython that referenced this pull requestJul 11, 2024
estyxx pushed a commit to estyxx/cpython that referenced this pull requestJul 17, 2024
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@serhiy-storchakaserhiy-storchakaserhiy-storchaka approved these changes

@gaogaotiantiangaogaotiantiangaogaotiantian approved these changes

@erlend-aaslanderlend-aaslanderlend-aasland approved these changes

@sunmy2019sunmy2019Awaiting requested review from sunmy2019

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

5 participants
@eendebakpt@gaogaotiantian@erlend-aasland@serhiy-storchaka@sunmy2019

[8]ページ先頭

©2009-2025 Movatter.jp