Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-118761:email.quoprimime removingre import#132046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
Marius-Juston wants to merge11 commits intopython:main
base:main
Choose a base branch
Loading
fromMarius-Juston:quoprimime

Conversation

Marius-Juston
Copy link
Contributor

@Marius-JustonMarius-Juston commentedApr 3, 2025
edited
Loading

This pull request removes there module from theemail.quoprimime, thus increasing the import speed from5676 us to3669 us (a 60% import speed increase );

From

marius@DESKTOP-IOUM5DH:~/cpython$ ./python -X importtime -c"import email.quoprimime"import time: self [us]| cumulative| imported packageimport time:        88|         88|   _ioimport time:        19|         19|   marshalimport time:       143|        143|   posiximport time:       332|        580| _frozen_importlib_externalimport time:        42|         42|timeimport time:       125|        166| zipimportimport time:        25|         25|     _codecsimport time:       290|        315|   codecsimport time:       190|        190|   encodings.aliasesimport time:       417|        921| encodingsimport time:        90|         90| encodings.utf_8import time:        44|         44| _signalimport time:        22|         22|     _abcimport time:        93|        114|   abcimport time:       484|        484|   _collections_abcimport time:       136|        733| ioimport time:        22|         22|       _statimport time:        59|         80|     statimport time:        31|         31|       errnoimport time:        43|         43|       genericpathimport time:        87|        160|     posixpathimport time:       283|        523|   osimport time:        50|         50|   _sitebuiltinsimport time:        86|         86|   sitecustomizeimport time:        30|         30|   usercustomizeimport time:       216|        902| siteimport time:       114|        114| linecacheimport time:       203|        203|   emailimport time:        22|         22|     _stringimport time:       140|        140|         typesimport time:       795|        935|       enumimport time:        34|         34|         _sreimport time:       130|        130|           re._constantsimport time:       181|        311|         re._parserimport time:        49|         49|         re._casefiximport time:       198|        591|       re._compilerimport time:        57|         57|           itertoolsimport time:        75|         75|           keywordimport time:        41|         41|             _operatorimport time:       153|        194|           operatorimport time:        98|         98|           reprlibimport time:        32|         32|           _collectionsimport time:       553|       1006|         collectionsimport time:        30|         30|         _functoolsimport time:       346|       1381|       functoolsimport time:        95|         95|       copyregimport time:       311|       3311|     reimport time:       365|       3697|   stringimport time:      1777|       5676| email.quoprimime

To

marius@DESKTOP-IOUM5DH:~/cpython$ ./python -X importtime -c"import email.quoprimime"import time: self [us]| cumulative| imported packageimport time:        89|         89|   _ioimport time:        18|         18|   marshalimport time:       130|        130|   posiximport time:       305|        541| _frozen_importlib_externalimport time:        37|         37|timeimport time:       115|        152| zipimportimport time:        24|         24|     _codecsimport time:       273|        296|   codecsimport time:       175|        175|   encodings.aliasesimport time:       387|        857| encodingsimport time:        83|         83| encodings.utf_8import time:        40|         40| _signalimport time:        16|         16|     _abcimport time:        88|        103|   abcimport time:       422|        422|   _collections_abcimport time:       125|        649| ioimport time:        20|         20|       _statimport time:        54|         73|     statimport time:        29|         29|       errnoimport time:        39|         39|       genericpathimport time:        81|        148|     posixpathimport time:       269|        490|   osimport time:        48|         48|   _sitebuiltinsimport time:        80|         80|   sitecustomizeimport time:        28|         28|   usercustomizeimport time:       199|        842| siteimport time:       106|        106| linecacheimport time:       189|        189|   emailimport time:        18|         18|     _stringimport time:       124|        124|         typesimport time:       667|        791|       enumimport time:        33|         33|         _sreimport time:       130|        130|           re._constantsimport time:       177|        307|         re._parserimport time:        49|         49|         re._casefiximport time:       179|        566|       re._compilerimport time:        58|         58|           itertoolsimport time:        74|         74|           keywordimport time:        36|         36|             _operatorimport time:       148|        183|           operatorimport time:        96|         96|           reprlibimport time:        31|         31|           _collectionsimport time:       494|        934|         collectionsimport time:        27|         27|         _functoolsimport time:       315|       1274|       functoolsimport time:        87|         87|       copyregimport time:       284|       3000|     reimport time:       297|       3314|   stringimport time:       168|       3669| email.quoprimime

however, the new implementation does increase the compute time

TEST_CASES= {"empty":"Dracula","empty_medium":"Dracula"*10,"empty_long":"Dracula"*100,"short":"Hello=20World=21","medium":"This_is_a_test=3F=3D=2E"*10,"long":"Some_long_text_with_encoding=20"*100,"mixed":"A=2Equick=20brown=5Ffox=21=3F"*50,"edge_case_short":"=20=21=3F=2E=5F","edge_case_long":"=20=21=3F=2E=5F"*200}
Benchmarkregexnon_regex
empty284 ns382 ns: 1.34x slower
empty_medium302 ns2.99 us: 9.91x slower
empty_long371 ns28.6 us: 77.20x slower
short731 ns902 ns: 1.23x slower
medium6.24 us11.8 us: 1.89x slower
long25.5 us137 us: 5.37x slower
mixed57.0 us71.5 us: 1.25x slower
edge_case_short1.36 us916 ns: 1.48x faster
edge_case_long178 us160 us: 1.11x faster
Geometric mean(ref)2.78x slower

So it is very possible that this is not worth it.

Issues:

@Marius-JustonMarius-Juston requested a review froma team as acode ownerApril 3, 2025 10:22
@Marius-JustonMarius-Juston changed the titlegh-118761: Quoprimime removingre importgh-118761:email.quoprimime removingre importApr 3, 2025
@Marius-Juston
Copy link
ContributorAuthor

Marius-Juston commentedApr 3, 2025
edited
Loading

The PR:

will probably drastically improve the speed as well as oncestring lazy importsre it will drastically speed up thestring import and this module only uses thestring module to import constantsfrom string import ascii_letters, digits, hexdigits

@Marius-Juston
Copy link
ContributorAuthor

I did not notice that the warmup needed for./python -X importtime -c 'import email.quoprimime' and so the more accurate timings are actually:

regex: 153.9974 ± 35.97 (103 to 1778; n=10000)non_regex: 148.4565 ± 25.48 (125 to 991; n=10000)

@Marius-Juston
Copy link
ContributorAuthor

( the new _HEX_TO_CHAR cache could also be used for thedecode function as well afterwards since it checks for more or less the same thing)

# Decode if in form =ABelifi+2<nandline[i+1]inhexdigitsandline[i+2]inhexdigits:decoded+=unquote(line[i:i+3])

@Marius-Juston
Copy link
ContributorAuthor

Benchmarkregexnon_regex_2
empty288 ns259 ns: 1.11x faster
empty_medium299 ns1.74 us: 5.81x slower
empty_long375 ns16.3 us: 43.61x slower
short725 ns714 ns: 1.01x faster
medium6.22 us7.97 us: 1.28x slower
long22.0 us85.9 us: 3.91x slower
mixed49.5 us56.3 us: 1.14x slower
edge_case_short1.26 us744 ns: 1.69x faster
edge_case_long177 us125 us: 1.41x faster
Geometric mean(ref)2.01x slower

Slightly faster

@Marius-Juston
Copy link
ContributorAuthor

Adding the '=' check now speeds things up:

Benchmarkregexnon_regex
empty288 ns53.6 ns: 5.37x faster
empty_medium299 ns54.1 ns: 5.53x faster
empty_long375 ns62.5 ns: 6.00x faster
short725 ns722 ns: 1.00x faster
medium6.22 us8.09 us: 1.30x slower
long22.0 us86.7 us: 3.94x slower
mixed49.5 us58.6 us: 1.18x slower
edge_case_short1.26 us767 ns: 1.64x faster
edge_case_long177 us127 us: 1.39x faster
Geometric mean(ref)1.60x faster

@Marius-Juston
Copy link
ContributorAuthor

Marius-Juston commentedApr 3, 2025
edited
Loading

As a comparison (if you compile the regex for the function + add early exit)

c=re.compile("=[a-fA-F0-9]{2}",flags=re.ASCII)defheader_decode_re(s):"""Decode a string using regex."""s=s.replace('_',' ')# Replace underscores with spacesif'='ins:returnc.sub(_unquote_match,s)returns
Benchmarkregexregex2non_regex
empty288 ns51.4 ns: 5.60x faster53.6 ns: 5.37x faster
empty_medium299 ns52.0 ns: 5.76x faster54.1 ns: 5.53x faster
empty_long375 ns61.0 ns: 6.15x faster62.5 ns: 6.00x faster
short725 ns560 ns: 1.29x faster722 ns: 1.00x faster
medium6.22 us6.44 us: 1.04x slower8.09 us: 1.30x slower
long22.0 us22.9 us: 1.04x slower86.7 us: 3.94x slower
mixed49.5 us52.6 us: 1.06x slower58.6 us: 1.18x slower
edge_case_short1.26 us1.12 us: 1.13x faster767 ns: 1.64x faster
edge_case_long177 us189 us: 1.07x slower127 us: 1.39x faster
Geometric mean(ref)1.83x faster1.60x faster

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
@Marius-Juston
Copy link
ContributorAuthor

@AA-Turner, what's your opinion on replacing this regex expression (even though it sometimes makes the algorithm slower)?

@Marius-Juston
Copy link
ContributorAuthor

Very slight improvement (mainly on theedge_case_short andshort where string concatenation is faster than using"".join()

Benchmarkregexregex2non_regexnon_regex_add
empty288 ns51.4 ns: 5.60x faster53.6 ns: 5.37x faster51.6 ns: 5.58x faster
empty_medium299 ns52.0 ns: 5.76x faster54.1 ns: 5.53x faster51.6 ns: 5.80x faster
empty_long375 ns61.0 ns: 6.15x faster62.5 ns: 6.00x faster59.9 ns: 6.26x faster
short725 ns560 ns: 1.29x faster722 ns: 1.00x faster674 ns: 1.08x faster
medium6.22 us6.44 us: 1.04x slower8.09 us: 1.30x slower7.82 us: 1.26x slower
long22.0 us22.9 us: 1.04x slower86.7 us: 3.94x slower83.5 us: 3.79x slower
mixed49.5 us52.6 us: 1.06x slower58.6 us: 1.18x slower60.6 us: 1.22x slower
edge_case_short1.26 us1.12 us: 1.13x faster767 ns: 1.64x faster699 ns: 1.80x faster
edge_case_long177 us189 us: 1.07x slower127 us: 1.39x faster127 us: 1.39x faster
Geometric mean(ref)1.83x faster1.60x faster1.66x faster

Copy link
Contributor

@hauntsaninjahauntsaninja left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is slower and harder to maintain, so I'm -1 on this PR

@bedevere-app
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phraseI have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@hauntsaninja
Copy link
Contributor

hauntsaninja commentedApr 6, 2025
edited
Loading

Actually I don't understand this PR.

It looks like we still importre transitively? Also I don't understand why the "self" time reported by-X importtime in your PR body goes from 1777 to 168, if anything looks like quoprimime.py does more work at import time now

AA-Turner reacted with thumbs up emoji

@AA-Turner
Copy link
Member

AA-Turner commentedApr 6, 2025
edited
Loading

I'm a bit lost on the current benchmarks, but the most recent comment (withnon_regex_add) appears to indicate this is slightly faster. That said, I agree with@hauntsaninja that the algorithm in the PR is too complicated and will be difficult to maintain, in contrast to the one-liner regular expression.

It looks like we still import re transitively?

Throughstring, see#132037 to help there.

A

@AA-Turner
Copy link
Member

AA-Turner commentedApr 6, 2025
edited
Loading

Also I don't understand why the "self" time reported by-X importtime in your PR body goes from 1777 to 168, if anything looks like quoprimime.py does more work at import time now

I agree this is odd. I've been using the below (rough) script to benchmark import times, for more data points than just a single run.

bench.py
importsubprocess,sysimportstatisticsBASE_CMD= (sys.executable,'-Ximporttime','-S','-c',)defrun_importtime(mod:str)->str:returnsubprocess.run(BASE_CMD+ (f'import{mod}',),check=True,capture_output=True,encoding='utf-8').stderrformodinsys.argv[1:]:for_inrange(5):# warmuplines=run_importtime(mod)print(lines.partition('\n')[0])own_times= []cum_times= []for_inrange(50):lines=run_importtime(mod)final_line=lines.rstrip().rpartition('\n')[-1]# print(final_line)# import time:       {own} |       {cum} | {mod}own,cum=map(int,final_line.split()[2:5:2])own_times.append(own)cum_times.append(cum)own_times.sort()cum_times.sort()own_times[:]=own_times[10:-10]cum_times[:]=cum_times[10:-10]forlabel,timesin [('own',own_times), ('cumulative',cum_times)]:print()print(f'import{mod}:{label} time')print(f'mean:{statistics.mean(times):.3f} µs')print(f'median:{statistics.median(times):.3f} µs')print(f'stdev:{statistics.stdev(times):.3f}')print('min:',min(times))print('max:',max(times))

@python-cla-bot
Copy link

All commit authors signed the Contributor License Agreement.

CLA signed

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@AA-TurnerAA-TurnerAA-Turner left review comments

@hauntsaninjahauntsaninjahauntsaninja requested changes

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

4 participants
@Marius-Juston@hauntsaninja@AA-Turner@ZeroIntensity

[8]ページ先頭

©2009-2025 Movatter.jp