Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-72793: C implementation of parts of copy.deepcopy#91610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
eendebakpt wants to merge77 commits intopython:main
base:main
Choose a base branch
Loading
fromeendebakpt:deepcopy_c_implementation

Conversation

eendebakpt
Copy link
Contributor

@eendebakpteendebakpt commentedApr 16, 2022
edited
Loading

The original idea and implementation are from@Villemoes. The original issue has been inactive for a long time.

Benchmark on deepcopy of adict anddataclass:

deepcopy dict: Mean +- std dev: [b] 7.28 us +- 0.27 us -> [d] 982 ns +- 24 ns: 7.41x fasterdeepcopy dataclass: Mean +- std dev: [b] 6.49 us +- 0.07 us -> [d] 3.69 us +- 0.11 us: 1.76x fasterGeometric mean: 3.61x faster

Updated benchmark: (23-12-2022, main at1ecfd1e)

deepcopy dict: Mean +- std dev: [main] 6.56 us +- 0.07 us -> [pr] 821 ns +- 24 ns: 7.99x fasterdeepcopy dataclass: Mean +- std dev: [main] 6.12 us +- 0.07 us -> [pr] 3.10 us +- 0.09 us: 1.97x fasterGeometric mean: 3.97x faster

Updated benchmark: (16-9-2023, main ate57ecf6)

deepcopy dict: Mean +- std dev: [main] 7.91 us +- 0.22 us -> [pr] 958 ns +- 49 ns: 8.25x fasterdeepcopy dataclass: Mean +- std dev: [main] 7.74 us +- 0.26 us -> [pr] 3.65 us +- 0.27 us: 2.12x fasterGeometric mean: 4.19x faster

Updated benchmark (10-01-2024)

deepcopy dict: Mean +- std dev: [main] 6.82 us +- 0.07 us -> [c_impl] 922 ns +- 19 ns: 7.40x fasterdeepcopy dataclass: Mean +- std dev: [main] 6.59 us +- 0.07 us -> [c_impl] 3.34 us +- 0.06 us: 1.98x fasterdeepcopy small dataclass: Mean +- std dev: [main] 4.30 us +- 0.07 us -> [c_impl] 3.01 us +- 0.04 us: 1.43x fasterdeepcopy small tuple: Mean +- std dev: [main] 1.07 us +- 0.01 us -> [c_impl] 115 ns +- 1 ns: 9.30x fasterdeepcopy repeating: Mean +- std dev: [main] 21.8 us +- 0.2 us -> [c_impl] 8.66 us +- 0.28 us: 2.51x fasterdeepcopy repeating_atomic: Mean +- std dev: [main] 24.6 us +- 0.5 us -> [c_impl] 1.28 us +- 0.02 us: 19.16x fasterGeometric mean: 4.59x faster

Benchmark details

Test script

Updated test script

import pyperfrunner = pyperf.Runner()setup="""import copya={'list': [1,2,3,43], 't': (1,2,3), 'str': 'hello', 'subdict': {'a': True}}from dataclasses import dataclass@dataclassclass A:    a : list    b : str    c : bool    dc=A([1,2,3], 'hello', True)@dataclassclass A:    a : int    dc_small = A(123)small_tuple = (1, )l = {'hi': 100}repeating_atomic = [ [1] * 100]repeating = [dc_small] * 100"""runner.timeit(name="deepcopy dict", stmt=f"b=copy.deepcopy(a)", setup=setup)runner.timeit(name="deepcopy dataclass", stmt=f"b=copy.deepcopy(dc)", setup=setup)runner.timeit(name="deepcopy small dataclass", stmt=f"b=copy.deepcopy(dc_small)", setup=setup)runner.timeit(name="deepcopy small tuple", stmt=f"b=copy.deepcopy(small_tuple)", setup=setup)runner.timeit(name="deepcopy repeating", stmt=f"b=copy.deepcopy(repeating)", setup=setup)runner.timeit(name="deepcopy repeating_atomic", stmt=f"b=copy.deepcopy(repeating_atomic)", setup=setup)

Old test script:

import pyperfrunner = pyperf.Runner()setup="""import copya={'list': [1,2,3,43], 't': (1,2,3), 'str': 'hello', 'subdict': {'a': True}}from dataclasses import dataclass@dataclassclass A:    a : list    b : str    c : bool    dc=A([1,2,3], 'hello', True)"""runner.timeit(name=f"deepcopy dict", stmt=f"b=copy.deepcopy(a)", setup=setup)runner.timeit(name=f"deepcopy dataclass", stmt=f"b=copy.deepcopy(dc)", setup=setup)

Fixes#72793

Pyperformance results

Pyperformance results show small speedup, although this could very well be a random fluctuation. (there are no explicit calls to deepcopy in the pyperformance tests)

2to3: Mean +- std dev: [base] 325 ms +- 5 ms -> [patch] 322 ms +- 4 ms: 1.01x fasterchaos: Mean +- std dev: [base] 87.8 ms +- 0.8 ms -> [patch] 88.3 ms +- 1.1 ms: 1.01x slowerfloat: Mean +- std dev: [base] 103 ms +- 2 ms -> [patch] 98.5 ms +- 2.5 ms: 1.04x fastergo: Mean +- std dev: [base] 163 ms +- 1 ms -> [patch] 165 ms +- 2 ms: 1.01x slowerjson_dumps: Mean +- std dev: [base] 15.1 ms +- 0.5 ms -> [patch] 14.9 ms +- 0.1 ms: 1.01x fasterjson_loads: Mean +- std dev: [base] 28.0 us +- 0.3 us -> [patch] 28.1 us +- 0.3 us: 1.01x slowerlogging_format: Mean +- std dev: [base] 8.11 us +- 0.14 us -> [patch] 8.18 us +- 0.12 us: 1.01x slowerlogging_silent: Mean +- std dev: [base] 133 ns +- 1 ns -> [patch] 130 ns +- 1 ns: 1.02x fastermeteor_contest: Mean +- std dev: [base] 128 ms +- 1 ms -> [patch] 126 ms +- 1 ms: 1.01x fasternbody: Mean +- std dev: [base] 111 ms +- 2 ms -> [patch] 113 ms +- 2 ms: 1.01x slowernqueens: Mean +- std dev: [base] 106 ms +- 1 ms -> [patch] 106 ms +- 1 ms: 1.00x slowerpathlib: Mean +- std dev: [base] 23.0 ms +- 0.4 ms -> [patch] 23.2 ms +- 0.7 ms: 1.01x slowerpickle: Mean +- std dev: [base] 10.5 us +- 0.1 us -> [patch] 10.6 us +- 0.4 us: 1.01x slowerpickle_dict: Mean +- std dev: [base] 30.9 us +- 0.2 us -> [patch] 31.3 us +- 0.3 us: 1.02x slowerpickle_list: Mean +- std dev: [base] 4.41 us +- 0.04 us -> [patch] 4.55 us +- 0.06 us: 1.03x slowerpyflate: Mean +- std dev: [base] 532 ms +- 8 ms -> [patch] 535 ms +- 4 ms: 1.01x slowerregex_compile: Mean +- std dev: [base] 163 ms +- 1 ms -> [patch] 165 ms +- 1 ms: 1.01x slowerregex_dna: Mean +- std dev: [base] 238 ms +- 2 ms -> [patch] 213 ms +- 1 ms: 1.12x fasterregex_effbot: Mean +- std dev: [base] 3.50 ms +- 0.03 ms -> [patch] 3.08 ms +- 0.03 ms: 1.14x fasterregex_v8: Mean +- std dev: [base] 29.2 ms +- 0.7 ms -> [patch] 25.8 ms +- 0.9 ms: 1.13x fasterrichards: Mean +- std dev: [base] 57.4 ms +- 1.5 ms -> [patch] 58.2 ms +- 1.3 ms: 1.01x slowerscimark_fft: Mean +- std dev: [base] 447 ms +- 10 ms -> [patch] 452 ms +- 2 ms: 1.01x slowerscimark_lu: Mean +- std dev: [base] 142 ms +- 2 ms -> [patch] 145 ms +- 1 ms: 1.02x slowerscimark_sor: Mean +- std dev: [base] 155 ms +- 1 ms -> [patch] 156 ms +- 2 ms: 1.01x slowerscimark_sparse_mat_mult: Mean +- std dev: [base] 6.18 ms +- 0.11 ms -> [patch] 6.03 ms +- 0.06 ms: 1.02x fasterspectral_norm: Mean +- std dev: [base] 150 ms +- 7 ms -> [patch] 146 ms +- 3 ms: 1.02x fasterunpack_sequence: Mean +- std dev: [base] 52.6 ns +- 1.0 ns -> [patch] 52.2 ns +- 0.6 ns: 1.01x fasterunpickle_list: Mean +- std dev: [base] 5.77 us +- 0.06 us -> [patch] 5.88 us +- 0.10 us: 1.02x slowerxml_etree_parse: Mean +- std dev: [base] 176 ms +- 3 ms -> [patch] 173 ms +- 2 ms: 1.02x fasterBenchmark hidden because not significant (17): deltablue, fannkuch, hexiom, logging_simple, pickle_pure_python, pidigits, python_startup, python_startup_no_site, raytrace, scimark_monte_carlo, sqlite_synth, telco, unpickle, unpickle_pure_python, xml_etree_iterparse, xml_etree_generate, xml_etree_processGeometric mean: 1.01x faster

Notes

  • We keep the original python code for the implementation ofcopy.deepcopy (seehttps://peps.python.org/pep-0399/). In the CI we test both the pure python and accelerator version ofdeepcopy.
  • The performance improvement of deepcopy has a measurable impact for certain operations in projects such as lmfit and qiskit.

gst reacted with thumbs up emojiStarbuck5, Fidget-Spinner, and network-shark reacted with rocket emoji
@eendebakpteendebakpt marked this pull request as draftApril 16, 2022 18:44
@eendebakpteendebakpt marked this pull request as ready for reviewApril 16, 2022 21:03
@eendebakpt
Copy link
ContributorAuthor

@pablogsal Would you be able to review this PR?

@eendebakpt
Copy link
ContributorAuthor

@serhiy-storchaka As the latest core develop working on this file, would you be able to review this PR?

@eendebakpteendebakptforce-pushed thedeepcopy_c_implementation branch from83741a6 to2f8e489CompareJune 7, 2022 19:22
@eendebakpteendebakptforce-pushed thedeepcopy_c_implementation branch 2 times, most recently fromaf2a201 to502c747CompareJune 7, 2022 19:29
@AA-TurnerAA-Turner added the extension-modulesC modules in the Modules dir labelJun 7, 2022
Copy link
Member

@AA-TurnerAA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't feel comfortable reviewing the C code, but you will need to add tests to ensure the fallback and C implementation have the same inputs/outputs/etc -- the easiest way would be to duplicate the tests and run one set for the C accelerator and one for the Python version.

A

@eendebakpteendebakptforce-pushed thedeepcopy_c_implementation branch from2c18cb6 toc0fd5c9CompareJune 8, 2022 11:25
@eendebakpt
Copy link
ContributorAuthor

I don't feel comfortable reviewing the C code, but you will need to add tests to ensure the fallback and C implementation have the same inputs/outputs/etc -- the easiest way would be to duplicate the tests and run one set for the C accelerator and one for the Python version.

A

@AA-Turner The_deepcopy_fallback is only used for objects not handled by the Cdeepcopy method (and vice versa). therefore we cannot test both the fallback and C on the same inputs. Thedeepcopy is the public method (and_deepcopy_fallback is called viadeepcopy), so I think we only need to test ondeepcopy. If there are any tests you would like me to add, let me know.

Note: in earlier versions of the PR the fallback there was a funnytry-except statement to fix a build problem. This I resolved by making_copy a builtin module.

@AA-Turner
Copy link
Member

The _deepcopy_fallback is only used for objects not handled by the C deepcopy method (and vice versa). therefore we cannot test both the fallback and C on the same inputs.

If both functions exist and can be theoretically used, both must be tested. You canimport copy._deepcopy_fallback andimport _copy.deepcopy to test the functions independently. If I understand correctly, in your current patch_deepcopy_fallback is only ever called from the C layer. If so, you should make this more clear -- oftentimes the C accelerator has a pure Python fallback which implements the same method when the extension module can't be loaded.

A

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phraseI have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@serhiy-storchaka
Copy link
Member

There are some issues with thecopy module. I prefer to resolve them prior to adding the C implementation.

@eendebakpt
Copy link
ContributorAuthor

The _deepcopy_fallback is only used for objects not handled by the C deepcopy method (and vice versa). therefore we cannot test both the fallback and C on the same inputs.

If both functions exist and can be theoretically used, both must be tested. You canimport copy._deepcopy_fallback andimport _copy.deepcopy to test the functions independently. If I understand correctly, in your current patch_deepcopy_fallback is only ever called from the C layer. If so, you should make this more clear -- oftentimes the C accelerator has a pure Python fallback which implements the same method when the extension module can't be loaded.

A

@tiran noted the fallback will be used by PyPy, so it must indeed be tested. I will add the required tests

@tiran
Copy link
Member

We have helper code to block / force imports to test both pure and C accelerated features.

from test.support.import_helper import import_fresh_modulecopy_py = import_fresh_module('copy', blocked=['_copy'])try:    copy_c = import_fresh_module('copy', fresh=['_copy'])except ImportError:    copy_c = None
eendebakpt reacted with thumbs up emoji

@eendebakpteendebakpt requested a review froma team as acode ownerJune 8, 2022 19:00
@eendebakpt
Copy link
ContributorAuthor

I addressed some of the review comments.

@serhiy-storchaka Could you indicate which issues with thecopy module there are, and whether there already is a timeline on addressing them?

@erlend-aasland
Copy link
Contributor

FYI,test_copy is failing in the Hypothesis CI:https://github.com/python/cpython/actions/runs/6403563044/job/17382333727?pr=91610

@eendebakpt
Copy link
ContributorAuthor

FYI,test_copy is failing in the Hypothesis CI:https://github.com/python/cpython/actions/runs/6403563044/job/17382333727?pr=91610

Will look at later this week

erlend-aasland reacted with thumbs up emoji

@eendebakpt
Copy link
ContributorAuthor

@erlend-aasland The failing tests where due to the recent addition ofcopy.replace. I added the unit tests for new functionality in the same structure as the other unit tests. This means that the pure python methodcopy.replace is tested twice (which is a bit redundant, but with better consistency overall).

The PR is ready for review, although I am not sure the item Serhiy raised should be resolved first. See#103035 (comment) and#109498.

erlend-aasland reacted with thumbs up emoji

@erlend-aasland
Copy link
Contributor

The PR is ready for review, although I am not sure the item Serhiy raised should be resolved first. See#103035 (comment) and#109498.

I'll leave that to@serhiy-storchaka.

@eendebakpt
Copy link
ContributorAuthor

For any reviewers: I also created a performance improvement for the python implementation:#114266.

Depending on the outcome of that PR and some (minor) changes I want to do to make this work in free-threading I will update the benchmarks.

@erlend-aasland
Copy link
Contributor

@eendebakpt, also consider adapting to free-threading in a follow-up PR; this PR is already a huge diff.

@kumaraditya303kumaraditya303 removed their request for reviewMay 27, 2024 11:38
@ghost
Copy link

ghost commentedJun 9, 2024
edited by ghost
Loading

The following commit authors need to sign the Contributor License Agreement:

Click the button to sign:
CLA not signed

@eendebakpt
Copy link
ContributorAuthor

Update of the benchmarks with recent changes to main:

deepcopy dict: Mean +- std dev: [main] 3.75 us +- 0.17 us -> [pr] 587 ns +- 25 ns: 6.40x fasterdeepcopy dataclass: Mean +- std dev: [main] 4.21 us +- 0.08 us -> [pr] 2.57 us +- 0.11 us: 1.64x fasterdeepcopy small dataclass: Mean +- std dev: [main] 3.03 us +- 0.12 us -> [pr] 2.28 us +- 0.10 us: 1.33x fasterdeepcopy small tuple: Mean +- std dev: [main] 732 ns +- 21 ns -> [pr] 72.8 ns +- 2.5 ns: 10.05x fasterdeepcopy repeating: Mean +- std dev: [main] 20.7 us +- 0.7 us -> [pr] 6.49 us +- 0.32 us: 3.18x fasterdeepcopy repeating_atomic: Mean +- std dev: [main] 10.2 us +- 0.3 us -> [pr] 801 ns +- 33 ns: 12.79x fasterGeometric mean: 4.23x faster
erlend-aasland, Eclips4, and a-reich reacted with rocket emoji

@eendebakpt
Copy link
ContributorAuthor

To make a fair comparison between current main and the C implementation in this PR I looked at the python implementation again to see whether there are optimizations we do in the C implementation that could also be applied in the Python main. There are quite some options:

  • Remove the redundant_nil argument todeepcopy
  • Replace try-except constructions with dictionaries with a d.get(key, sentinel) to avoid expensive creation of exceptions
  • Use comprehensions in thedeepcopy_list anddeepcopy_tuple
  • Indeepcopy_tuple combine the creation ofy and the checkfor k, j in zip(x, y): into a single for loop
  • Inline theif cls in _atomic_types: return x part at various locations
  • Inline the call to the (only once used)_keep_alive

I will not make PRs for these, because each of them has a very high likelihood of being rejected because either the performance gain is small or the change would reduce readability or maintainability of the code (for the interested reader: I would give the last option the highest probability of success).

encukou and auvipy reacted with thumbs up emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@tirantirantiran left review comments

@AA-TurnerAA-TurnerAA-Turner left review comments

@kumaraditya303kumaraditya303kumaraditya303 requested changes

@slimshady621slimshady621slimshady621 approved these changes

@corona10corona10Awaiting requested review from corona10corona10 is a code owner

@erlend-aaslanderlend-aaslandAwaiting requested review from erlend-aaslanderlend-aasland is a code owner

Assignees
No one assigned
Labels
awaiting change reviewextension-modulesC modules in the Modules dirperformancePerformance or resource usage
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

C implementation of parts of copy.deepcopy
9 participants
@eendebakpt@AA-Turner@bedevere-bot@serhiy-storchaka@tiran@erlend-aasland@kumaraditya303@slimshady621@AlexWaygood

[8]ページ先頭

©2009-2025 Movatter.jp