Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-121313: Limit the reading size from pipes to their default buffer size on Unix systems#121315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged

Conversation

@aplaikner
Copy link
Contributor

@aplaikneraplaikner commentedJul 3, 2024
edited by bedevere-appbot
Loading

@ghost
Copy link

ghost commentedJul 3, 2024
edited by ghost
Loading

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply theskip news label instead.

@bedevere-app
Copy link

Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply theskip news label instead.

@bedevere-app
Copy link

Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply theskip news label instead.

@cmaloney
Copy link
Contributor

cmaloney commentedJul 4, 2024
edited
Loading

os.read() /_os_read_impl is used for reading from most kinds of files in Python. Definitely the limited size makes sense for pipes, but disk I/O generally wants "as big a read as possible". For instance reading regular files, such as python source code, one read call with a buffer that can fit the whole file is fastest in my experimenting. For both that case and the pipe case, it would be more efficient to figure out "whats the max read size" once (with the system calls that entails potentially) and re-use that for every subsequent read call

Following your chain of pieces, could this be made to be more targeted to the specific case potentially? Two thoughts

  1. This is specifically caused byLib/multiprocessing/connection.py, can that specify explicitly the size of read it wants?
  2. Rather than checking / adjusting the size for every read, could that be done just when the pipe is opened/created? So on open, check type, and stash the "max read size". Compare against that (The code currently checks against_PY_READ_MAX constant, this would just be saying max read size is file type dependent, which is true on both Windows and Linux)

See also:gh-117151 which is aiming to increase the default size (albeit focused around write performance)

@aplaikner
Copy link
ContributorAuthor

aplaikner commentedJul 5, 2024
edited
Loading

I've tried shifting the check toLib/multiprocessing/connection.py and it seems promising, yielding the same performance improvements as having the checks in the C code. The change toos_read_impl would be reverted and the following patch applied toLib/multiprocessing/connection.py:

diff --git a/Lib/multiprocessing/connection.py b/Lib/multiprocessing/connection.pyindex b7e1e13217..4797ca4df8 100644--- a/Lib/multiprocessing/connection.py+++ b/Lib/multiprocessing/connection.py@@ -18,6 +18,7 @@ import time import tempfile import itertools+import stat   from . import util@@ -391,8 +392,17 @@ def _recv(self, size, read=_read):         buf = io.BytesIO()         handle = self._handle         remaining = size+        is_pipe = False+        page_size = 0+        if not _winapi:+            page_size = os.sysconf(os.sysconf_names['SC_PAGESIZE'])+            if size > 16 * page_size:+                mode = os.fstat(handle).st_mode+                is_pipe = stat.S_ISFIFO(mode)+        limit = 16 * page_size if is_pipe else remaining         while remaining > 0:-            chunk = read(handle, remaining)+            to_read = min(limit, remaining)+            chunk = read(handle, to_read)             n = len(chunk)             if n == 0:                 if remaining == size:

Copy link
Contributor

@cmaloneycmaloney left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looking reasonable to me overall: Unlikely to break compatibility or reduce performance, improves default behavior. A couple smaller change requests from me.

It would be nice to add a test that will fail if something breaks / results in the "read too large on pipes resulting in bad behavior" again, although I don't see a straightforward way to do that (Maybe mockingConnection._read in a new test in_test_multiprocessing and checking the size of read when know it is a pipe?)

importtime
importtempfile
importitertools
importstat
Copy link
Contributor

@cmaloneycmaloneyJul 5, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Personal nitpick,PEP-8 doesn't seem to specify (https://peps.python.org/pep-0008/#imports), but I like imports to be alphabetical.itertools,time, andtempfile which were already in the code just above this are also out of order (although time and tempfile only slightly). Rest are in order. Not sure if it matters for Python core developer acceptance

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done

cmaloney reacted with thumbs up emoji
is_pipe=False
page_size=0
ifnot_winapi:
page_size=os.sysconf(os.sysconf_names['SC_PAGESIZE'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Rather than do theif not _winapi here, which has to be run/interpreted per_recv call, can you add the "calculate max size for a fifo" likehttps://github.com/python/cpython/blob/main/Lib/multiprocessing/connection.py#L370-L379 does to choose/define the standard read function? Code here will still need to do themin logic + "is this a fifo", but at least reduces overhead work a little bit further.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I've shifted fetching the base page size and calculating the default pipe size to the existingif _winapi block above. Is this what you meant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yep, looking good

@@ -0,0 +1 @@
Limit reading size in os.read for pipes to default pipe size in order to avoid memory overallocation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This should be updated fromos.read ->multiprocessing to follow the logic location change.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done

cmaloney reacted with thumbs up emoji
@cmaloney
Copy link
Contributor

I think as far as I can review / needs a python core dev / someone with more project familiarity to look for high level things.

Some lingering thoughts I have:

  1. Would it make more sense to usefcntlF_GETPIPE_SZ rather than caluclate? I hadn't known about that until reading through the pipe man page linked.
  2. How does this work for non-linux systems? Particularly FreeBSD and Apple systems that are Python supported (https://peps.python.org/pep-0011/#support-tiers). I'm not familiar with pipes on those platforms at all currently.

@aplaikner
Copy link
ContributorAuthor

  1. When usingfcntl, an additional system call per_recv would be necessary. The main issue is that the code must be executed inside the_recv function becausefcntl requires the pipe's file descriptor. To avoid errors, a check to determine if the system is Windows would be needed before executingfcntl. This could be done with a boolean set inside theif _winapi check. Additionally, there should be a check to verify if the file descriptor belongs to a pipe before attempting to fetch the pipe size. This results in two checks before obtaining the pipe size.
    To optimize performance, these checks could be wrapped in another condition to verify if the read size is smaller than the default pipe size, skipping that code. Otherwise at least thefstat system call would be executed. However, this would again lead to a hardcoded value.
    Usingfcntl would provide a more dynamic approach, it would come at the cost of reduced performance due to the additional system calls and other checks, reducing performance.
    I think the current solution covers most cases, where the default pipe size is used. If someone changes that value, they would also need to change the new constant to see some performance benefits.
  2. I'm also not familiar with pipes on those systems, but it seems that FreeBSD and MacOS have both a default pipe buffer size of 64KiB:https://www.netmeister.org/blog/ipcbufs.html

@aplaikner
Copy link
ContributorAuthor

Hi@cmaloney, I wanted to check in and see if there are any additional steps I need to take for this pull request before it can be reviewed by a core developer.

Thank you!

@cmaloney
Copy link
Contributor

Re: Core Review, as far as I know no other steps needed. Fromhttps://devguide.python.org/getting-started/pull-request-lifecycle/#reviewing it's mainly just patience, that document suggests a month wait before pinging other locations.

aplaikner reacted with thumbs up emoji

@gpsheadgpshead self-assigned thisAug 31, 2024
@gpsheadgpshead added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelAug 31, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@gpshead for commit52606d1 🤖

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelAug 31, 2024
@gpshead
Copy link
Member

There's one potential further optimization, at least on Linux.fcntlF_GETPIPE_SZ on thefd if it is a pipe should return the actual size. A pipe might have been configured differently than the platform default. Regardless I don't expect that will have been the case within this multiprocessing code. Using that (andF_SETPIPE_SZ) could be a future enhancement (assuming it proves useful).

@gpsheadgpshead merged commit74bfb53 intopython:mainAug 31, 2024
@gpshead
Copy link
Member

Thanks for taking this on!

@methane
Copy link
Member

2. I'm also not familiar with pipes on those systems, but it seems that FreeBSD and MacOS have both a default pipe buffer size of 64KiB:https://www.netmeister.org/blog/ipcbufs.html

This PR uses 256KiB, not 64KiB on M1 mac (16K page).

gpshead reacted with thumbs up emoji

@vstinner
Copy link
Member

The Changelog entry was added to C API category, instead of the Library category.

@methane
Copy link
Member

Nice catch. I will change the category in#123559.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@methanemethanemethane left review comments

@devdanzindevdanzindevdanzin left review comments

@cmaloneycmaloneycmaloney approved these changes

@gpsheadgpsheadAwaiting requested review from gpsheadgpshead is a code owner

+1 more reviewer

@TalAmuyalTalAmuyalTalAmuyal left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

@gpsheadgpshead

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

8 participants

@aplaikner@cmaloney@bedevere-bot@gpshead@methane@vstinner@TalAmuyal@devdanzin

[8]ページ先頭

©2009-2025 Movatter.jp