Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-89550: Buffer GzipFile.write to reduce execution time by ~15%#101251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
gpshead merged 17 commits intopython:mainfromCCLDArjun:buffered-gzip-writes
May 8, 2023

Conversation

CCLDArjun
Copy link
Contributor

@CCLDArjunCCLDArjun commentedJan 22, 2023
edited
Loading

Currently, all of this is done for every write() call:
https://github.com/python/cpython/blob/main/Lib/gzip.py#L266-L288

This pr stores GzipFile.write() data into a buffer to postpone most of the compression work to when the buffer flushes. This way, we can call into zlib andFile.write() less frequently.

benchmarking with timeit:

../python-mtimeit-s 'importgzip' 'withgzip.open("./Python-3.11.1.tgz","rb")asinfile:' 'withgzip.open("./Python-3.11.1new.tgz","wb")asoutfile:' 'forlineininfile:' 'outfile.write(line)' ''1loop,bestof5:6.36secperloop

vs

../python-mtimeit-s 'importgzipoldasgzip' 'withgzip.open("./Python-3.11.1.tgz","rb")asinfile:' 'withgzip.open("./Python-3.11.1new.tgz","wb")asoutfile:' 'forlineininfile:' 'outfile.write(line)' ''1loop,bestof5:7.68secperloop

rhpvorderman and marcelm reacted with hooray emojirhpvorderman reacted with heart emoji
@bedevere-bot
Copy link

Most changes to Pythonrequire a NEWS entry.

Please add it using theblurb_it web app or theblurb command-line tool.

@CCLDArjunCCLDArjun changed the titleBuffer gzip writes to reduce execution time by ~15%gh-89550: Buffer GzipFile.write to reduce execution time by ~15%Jan 22, 2023
@AlexWaygoodAlexWaygood added the performancePerformance or resource usage labelJan 22, 2023
@rhpvorderman
Copy link
Contributor

Wow@CCLDArjun such a simple and effective solution. My solution was to split out a _GzipWriter (just like _GzipReader) and try to work from there, but I never got that to work easily and it was quite some extra code too. I am so glad you made this. I hope this will get merged soon!

CCLDArjun reacted with heart emoji

@CCLDArjun
Copy link
ContributorAuthor

My solution was to split out a _GzipWriter (just like _GzipReader) and try to work from there

Initially that was my idea too, where I subclassed_pyio.BufferedWriter to override its flush methods but that also wasn't working easily and was pretty messy:943ca9c

@rhpvorderman
Copy link
Contributor

I am glad that you were able to come up with a better idea. I have been hacking at the gzip module pretty thoroughly but I could not come up with this. I am in awe 😀 . I will port this topython-isal and I have recently started on making azlib-ng module too. That way people will not have to wait for 3.12 to ship to get all this fast compression.

@orsenthil
Copy link
Member

This looks good to me. A core developer in the expert list will review before merging.

@rhpvorderman
Copy link
Contributor

@orsenthil This is still labeled "awaiting review" rather then "awaiting core review" does that matter?

@arhadthedev
Copy link
Member

@rhpvorderman Only review messages from the Python team count.

@Yhg1s,@gpshead as zipfile experts (not quite the same but still).

Copy link
Member

@gpsheadgpshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Overall I like this change. It's great how cleanly this can be added. :)

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phraseI have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@gpsheadgpshead self-assigned thisFeb 18, 2023
@CCLDArjun
Copy link
ContributorAuthor

@gpshead tests are passing! Can you re review?

@rhpvorderman
Copy link
Contributor

ping

It would be great if this gets into 3.12. The measured execution difference of 15%is at gzip level 9 which is the default. At a more modest compression level such as 1, the difference can be a couple of hundreds of percents.

@CCLDArjun
Copy link
ContributorAuthor

pinging@gpshead for a final review

@gpsheadgpsheadenabled auto-merge (squash)May 8, 2023 17:29
@gpsheadgpshead merged commit9af4854 intopython:mainMay 8, 2023
jbower-fb pushed a commit to jbower-fb/cpython that referenced this pull requestMay 8, 2023
python#101251)Use `io.BufferedWriter` to buffer gzip writes.---------Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>Co-authored-by: Gregory P. Smith <greg@krypto.org>
carljm added a commit to carljm/cpython that referenced this pull requestMay 9, 2023
* main: (47 commits)pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188)pythonGH-104308: socket.getnameinfo should release the GIL (python#104307)pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311)pythongh-99113: A Per-Interpreter GIL! (pythongh-104210)pythonGH-104284: Fix documentation gettext build (python#104296)pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251)pythongh-104223: Fix issues with inheriting from buffer classes (python#104227)pythongh-99108: fix typo in Modules/Setup (python#104293)pythonGH-104145: Use fully-qualified cross reference types for the bisect module (python#104172)pythongh-103193: Improve `getattr_static` test coverage (python#104286)  Trim trailing whitespace and test on CI (python#104275)pythongh-102500: Remove mention of bytes shorthand (python#104281)pythongh-97696: Improve and fix documentation for asyncio eager tasks (python#104256)pythongh-99108: Replace SHA3 implementation HACL* version (python#103597)pythongh-104273: Remove redundant len() calls in argparse function (python#104274)pythongh-64660: Don't hardcode Argument Clinic return converter result variable name (python#104200)pythongh-104265 Disallow instantiation of `_csv.Reader` and `_csv.Writer` (python#104266)pythonGH-102613: Improve performance of `pathlib.Path.rglob()` (pythonGH-104244)pythongh-103650: Fix perf maps address format (python#103651)pythonGH-89812: Churn `pathlib.Path` methods (pythonGH-104243)  ...
carljm added a commit to carljm/cpython that referenced this pull requestMay 9, 2023
* main: (29 commits)pythongh-104276: Make `_struct.unpack_iterator` type use type flag instead of custom constructor (python#104277)pythongh-97696: Move around and update the whatsnew entry for asyncio eager task factory (python#104298)pythongh-103193: Fix refleaks in `test_inspect` and `test_typing` (python#104320)  require-pr-label.yml: Add missing "permissions:" (python#104309)pythongh-90656: Add platform triplets for 64-bit LoongArch (LA64) (python#30939)pythongh-104180: Read SOCKS proxies from macOS System Configuration (python#104181)pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188)pythonGH-104308: socket.getnameinfo should release the GIL (python#104307)pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311)pythongh-99113: A Per-Interpreter GIL! (pythongh-104210)pythonGH-104284: Fix documentation gettext build (python#104296)pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251)pythongh-104223: Fix issues with inheriting from buffer classes (python#104227)pythongh-99108: fix typo in Modules/Setup (python#104293)pythonGH-104145: Use fully-qualified cross reference types for the bisect module (python#104172)pythongh-103193: Improve `getattr_static` test coverage (python#104286)  Trim trailing whitespace and test on CI (python#104275)pythongh-102500: Remove mention of bytes shorthand (python#104281)pythongh-97696: Improve and fix documentation for asyncio eager tasks (python#104256)pythongh-99108: Replace SHA3 implementation HACL* version (python#103597)  ...
carljm added a commit to carljm/cpython that referenced this pull requestMay 9, 2023
* main: (156 commits)pythongh-97696 Add documentation for get_coro() behavior with eager tasks (python#104304)pythongh-97933: (PEP 709) inline list/dict/set comprehensions (python#101441)pythongh-99889: Fix directory traversal security flaw in uu.decode() (python#104096)pythongh-104184: fix building --with-pydebug --enable-pystats (python#104217)pythongh-104139: Add itms-services to uses_netloc urllib.parse. (python#104312)pythongh-104240: return code unit metadata from codegen (python#104300)pythongh-104276: Make `_struct.unpack_iterator` type use type flag instead of custom constructor (python#104277)pythongh-97696: Move around and update the whatsnew entry for asyncio eager task factory (python#104298)pythongh-103193: Fix refleaks in `test_inspect` and `test_typing` (python#104320)  require-pr-label.yml: Add missing "permissions:" (python#104309)pythongh-90656: Add platform triplets for 64-bit LoongArch (LA64) (python#30939)pythongh-104180: Read SOCKS proxies from macOS System Configuration (python#104181)pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188)pythonGH-104308: socket.getnameinfo should release the GIL (python#104307)pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311)pythongh-99113: A Per-Interpreter GIL! (pythongh-104210)pythonGH-104284: Fix documentation gettext build (python#104296)pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251)pythongh-104223: Fix issues with inheriting from buffer classes (python#104227)pythongh-99108: fix typo in Modules/Setup (python#104293)  ...
carljm added a commit to carljm/cpython that referenced this pull requestMay 9, 2023
* main: (35 commits)pythongh-97696 Add documentation for get_coro() behavior with eager tasks (python#104304)pythongh-97933: (PEP 709) inline list/dict/set comprehensions (python#101441)pythongh-99889: Fix directory traversal security flaw in uu.decode() (python#104096)pythongh-104184: fix building --with-pydebug --enable-pystats (python#104217)pythongh-104139: Add itms-services to uses_netloc urllib.parse. (python#104312)pythongh-104240: return code unit metadata from codegen (python#104300)pythongh-104276: Make `_struct.unpack_iterator` type use type flag instead of custom constructor (python#104277)pythongh-97696: Move around and update the whatsnew entry for asyncio eager task factory (python#104298)pythongh-103193: Fix refleaks in `test_inspect` and `test_typing` (python#104320)  require-pr-label.yml: Add missing "permissions:" (python#104309)pythongh-90656: Add platform triplets for 64-bit LoongArch (LA64) (python#30939)pythongh-104180: Read SOCKS proxies from macOS System Configuration (python#104181)pythongh-97696 Remove unnecessary check for eager_start kwarg (python#104188)pythonGH-104308: socket.getnameinfo should release the GIL (python#104307)pythongh-104310: Add importlib.util.allowing_all_extensions() (pythongh-104311)pythongh-99113: A Per-Interpreter GIL! (pythongh-104210)pythonGH-104284: Fix documentation gettext build (python#104296)pythongh-89550: Buffer GzipFile.write to reduce execution time by ~15% (python#101251)pythongh-104223: Fix issues with inheriting from buffer classes (python#104227)pythongh-99108: fix typo in Modules/Setup (python#104293)  ...
Yhg1s added a commit to Yhg1s/cpython that referenced this pull requestJun 19, 2023
GzipFile.flush() to not flush the compressor (nor pass along the zip_modeargument).
Yhg1s added a commit to Yhg1s/cpython that referenced this pull requestJun 19, 2023
…h() tonot flush the compressor (nor pass along the zip_mode argument).
gpshead pushed a commit that referenced this pull requestJun 19, 2023
Fix a regression introduced inGH-101251, causing GzipFile.flush() tonot flush the compressor (nor pass along the zip_mode argument).
miss-islington pushed a commit to miss-islington/cpython that referenced this pull requestJun 19, 2023
…onGH-105910)Fix a regression introduced inpythonGH-101251, causing GzipFile.flush() tonot flush the compressor (nor pass along the zip_mode argument).(cherry picked from commit1858db7)Co-authored-by: T. Wouters <thomas@python.org>
Yhg1s added a commit that referenced this pull requestJun 19, 2023
#105920)GH-105808: Fix a regression introduced inGH-101251 (GH-105910)Fix a regression introduced inGH-101251, causing GzipFile.flush() tonot flush the compressor (nor pass along the zip_mode argument).(cherry picked from commit1858db7)Co-authored-by: T. Wouters <thomas@python.org>
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@svissersvissersvisser left review comments

@gpsheadgpsheadgpshead approved these changes

Assignees

@gpsheadgpshead

Labels
performancePerformance or resource usagestdlibPython modules in the Lib dir
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

8 participants
@CCLDArjun@bedevere-bot@rhpvorderman@orsenthil@arhadthedev@gpshead@svisser@AlexWaygood

[8]ページ先頭

©2009-2025 Movatter.jp