NotificationsYou must be signed in to change notification settings
Fork33.3k
Star69.7k

gh-121313: Limit the reading size from pipes to their default buffer size on Unix systems#121315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

gpshead merged 19 commits intopython:mainfromaplaikner:feature-smaller-pipe-buffer-pull-request

Aug 31, 2024

Merged

gh-121313: Limit the reading size from pipes to their default buffer size on Unix systems#121315

gpshead merged 19 commits intopython:mainfromaplaikner:feature-smaller-pipe-buffer-pull-request

Aug 31, 2024

Conversation

Copy link

Contributor

aplaikner commentedJul 3, 2024•
edited by bedevere-appbot
Loading

Issue:#121313

Issue:Limit the reading size from pipes to their default buffer size on Unix systems #121313

Add clean code

108e65b

Copy link

ghost commentedJul 3, 2024•
edited by ghost
Loading

All commit authors signed the Contributor License Agreement.

Copy link

bedevere-appbot commentedJul 3, 2024

Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply theskip news label instead.

bedevere-appbot mentioned this pull request

Jul 3, 2024

Limit the reading size from pipes to their default buffer size on Unix systems#121313

Closed

bedevere-appbot added the awaiting review label

Jul 3, 2024

Fix linting error & sysconf unsupported error

37ca606

Copy link

bedevere-appbot commentedJul 3, 2024

Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply theskip news label instead.

Merge branch 'main' into feature-smaller-pipe-buffer-pull-request

df4f307

Copy link

bedevere-appbot commentedJul 3, 2024

Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply theskip news label instead.

blurb-itbotand others added6 commits

July 3, 2024 10:11

📜🤖 Added by blurb_it.

89936c2

Fix news

31c65b4

Make page size non static

0a4e4a3

Merge branch 'main' into feature-smaller-pipe-buffer-pull-request

7afde51

Merge branch 'main' into feature-smaller-pipe-buffer-pull-request

8d9b16e

Remove redundant call to pymin

e6c64fe

Copy link

Contributor

cmaloney commentedJul 4, 2024•
edited
Loading

os.read() /_os_read_impl is used for reading from most kinds of files in Python. Definitely the limited size makes sense for pipes, but disk I/O generally wants "as big a read as possible". For instance reading regular files, such as python source code, one read call with a buffer that can fit the whole file is fastest in my experimenting. For both that case and the pipe case, it would be more efficient to figure out "whats the max read size" once (with the system calls that entails potentially) and re-use that for every subsequent read call

Following your chain of pieces, could this be made to be more targeted to the specific case potentially? Two thoughts

This is specifically caused byLib/multiprocessing/connection.py, can that specify explicitly the size of read it wants?
Rather than checking / adjusting the size for every read, could that be done just when the pipe is opened/created? So on open, check type, and stash the "max read size". Compare against that (The code currently checks against_PY_READ_MAX constant, this would just be saying max read size is file type dependent, which is true on both Windows and Linux)

See also:gh-117151 which is aiming to increase the default size (albeit focused around write performance)

Copy link

ContributorAuthor

aplaikner commentedJul 5, 2024•
edited
Loading

I've tried shifting the check toLib/multiprocessing/connection.py and it seems promising, yielding the same performance improvements as having the checks in the C code. The change toos_read_impl would be reverted and the following patch applied toLib/multiprocessing/connection.py:

diff --git a/Lib/multiprocessing/connection.py b/Lib/multiprocessing/connection.pyindex b7e1e13217..4797ca4df8 100644--- a/Lib/multiprocessing/connection.py+++ b/Lib/multiprocessing/connection.py@@ -18,6 +18,7 @@ import time import tempfile import itertools+import stat   from . import util@@ -391,8 +392,17 @@ def _recv(self, size, read=_read):         buf = io.BytesIO()         handle = self._handle         remaining = size+        is_pipe = False+        page_size = 0+        if not _winapi:+            page_size = os.sysconf(os.sysconf_names['SC_PAGESIZE'])+            if size > 16 * page_size:+                mode = os.fstat(handle).st_mode+                is_pipe = stat.S_ISFIFO(mode)+        limit = 16 * page_size if is_pipe else remaining         while remaining > 0:-            chunk = read(handle, remaining)+            to_read = min(limit, remaining)+            chunk = read(handle, to_read)             n = len(chunk)             if n == 0:                 if remaining == size:

Shift pipe check to connection.py _recv

936b601

aplaikner requested a review fromgpshead as acode owner

July 5, 2024 07:43

aplaikner added2 commits

July 5, 2024 13:04

Make pipe size dependant on systems page size

49d8adb

Only execute fstat in case reading size is bigger than default pipe size

43e19dd

cmaloney reviewed

Jul 5, 2024

View reviewed changes

Copy link

Contributor

cmaloney left a comment•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looking reasonable to me overall: Unlikely to break compatibility or reduce performance, improves default behavior. A couple smaller change requests from me.

It would be nice to add a test that will fail if something breaks / results in the "read too large on pipes resulting in bad behavior" again, although I don't see a straightforward way to do that (Maybe mockingConnection._read in a new test in_test_multiprocessing and checking the size of read when know it is a pipe?)

Lib/multiprocessing/connection.py Outdated

		importtime
		importtempfile
		importitertools
		importstat

Copy link

Contributor

cmaloneyJul 5, 2024•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Personal nitpick,PEP-8 doesn't seem to specify (https://peps.python.org/pep-0008/#imports), but I like imports to be alphabetical.itertools,time, andtempfile which were already in the code just above this are also out of order (although time and tempfile only slightly). Rest are in order. Not sure if it matters for Python core developer acceptance

Copy link

ContributorAuthor

aplaiknerJul 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done

Lib/multiprocessing/connection.py Outdated

		is_pipe=False
		page_size=0
		ifnot_winapi:
		page_size=os.sysconf(os.sysconf_names['SC_PAGESIZE'])

Copy link

Contributor

cmaloneyJul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Rather than do theif not _winapi here, which has to be run/interpreted per_recv call, can you add the "calculate max size for a fifo" likehttps://github.com/python/cpython/blob/main/Lib/multiprocessing/connection.py#L370-L379 does to choose/define the standard read function? Code here will still need to do themin logic + "is this a fifo", but at least reduces overhead work a little bit further.

Copy link

ContributorAuthor

aplaiknerJul 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I've shifted fetching the base page size and calculating the default pipe size to the existingif _winapi block above. Is this what you meant?

Copy link

Contributor

cmaloneyJul 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yep, looking good

Misc/NEWS.d/next/C API/2024-07-03-10-11-53.gh-issue-121313.D7gARW.rst Outdated

		@@ -0,0 +1 @@
		Limit reading size in os.read for pipes to default pipe size in order to avoid memory overallocation

Copy link

Contributor

cmaloneyJul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This should be updated fromos.read ->multiprocessing to follow the logic location change.

Copy link

ContributorAuthor

aplaiknerJul 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Done

cmaloney mentioned this pull request

Jul 6, 2024

GH-120754: Add a strace helper and test set of syscalls for open().read()#121143

Merged

5 tasks

aplaiknerand others added5 commits

July 7, 2024 07:17

Update news message

b0b86e5

Make imports order alphabetical

2b6ff24

Shift calculation for pipe size to existing if _winapi check

e726f51

Fix linting error

59cff4d

Merge branch 'main' into feature-smaller-pipe-buffer-pull-request

da10f8e

TalAmuyal reviewed

Jul 7, 2024

View reviewed changes

Lib/multiprocessing/connection.py OutdatedShow resolvedHide resolved

Create constant for default number of pages per pipe

94d4c4a

Copy link

Contributor

cmaloney commentedJul 7, 2024

I think as far as I can review / needs a python core dev / someone with more project familiarity to look for high level things.

Some lingering thoughts I have:

Would it make more sense to usefcntlF_GETPIPE_SZ rather than caluclate? I hadn't known about that until reading through the pipe man page linked.
How does this work for non-linux systems? Particularly FreeBSD and Apple systems that are Python supported (https://peps.python.org/pep-0011/#support-tiers). I'm not familiar with pipes on those platforms at all currently.

Copy link

ContributorAuthor

aplaikner commentedJul 7, 2024

When usingfcntl, an additional system call per_recv would be necessary. The main issue is that the code must be executed inside the_recv function becausefcntl requires the pipe's file descriptor. To avoid errors, a check to determine if the system is Windows would be needed before executingfcntl. This could be done with a boolean set inside theif _winapi check. Additionally, there should be a check to verify if the file descriptor belongs to a pipe before attempting to fetch the pipe size. This results in two checks before obtaining the pipe size.
To optimize performance, these checks could be wrapped in another condition to verify if the read size is smaller than the default pipe size, skipping that code. Otherwise at least thefstat system call would be executed. However, this would again lead to a hardcoded value.
Usingfcntl would provide a more dynamic approach, it would come at the cost of reduced performance due to the additional system calls and other checks, reducing performance.
I think the current solution covers most cases, where the default pipe size is used. If someone changes that value, they would also need to change the new constant to see some performance benefits.
I'm also not familiar with pipes on those systems, but it seems that FreeBSD and MacOS have both a default pipe buffer size of 64KiB:https://www.netmeister.org/blog/ipcbufs.html

Copy link

ContributorAuthor

aplaikner commentedJul 29, 2024

Hi@cmaloney, I wanted to check in and see if there are any additional steps I need to take for this pull request before it can be reviewed by a core developer.

Thank you!

Merge branch 'main' into feature-smaller-pipe-buffer-pull-request

52606d1

Copy link

Contributor

cmaloney commentedJul 29, 2024

Re: Core Review, as far as I know no other steps needed. Fromhttps://devguide.python.org/getting-started/pull-request-lifecycle/#reviewing it's mainly just patience, that document suggests a month wait before pinging other locations.

cmaloney approved these changes

Aug 2, 2024

View reviewed changes

bedevere-appbot added awaiting core review and removed awaiting review labels

Aug 2, 2024

gpshead self-assigned this

Aug 31, 2024

gpshead added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section label

Aug 31, 2024

Copy link

bedevere-bot commentedAug 31, 2024

🤖 New build scheduled with the buildbot fleet by@gpshead for commit52606d1 🤖

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

bedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section label

Aug 31, 2024

Copy link

Member

gpshead commentedAug 31, 2024

There's one potential further optimization, at least on Linux.fcntlF_GETPIPE_SZ on thefd if it is a pipe should return the actual size. A pipe might have been configured differently than the platform default. Regardless I don't expect that will have been the case within this multiprocessing code. Using that (andF_SETPIPE_SZ) could be a future enhancement (assuming it proves useful).

gpshead merged commit74bfb53 intopython:main

Aug 31, 2024

bedevere-appbot removed the awaiting core review label

Aug 31, 2024

Copy link

Member

gpshead commentedAug 31, 2024

Thanks for taking this on!

Copy link

Member

methane commentedAug 31, 2024

2. I'm also not familiar with pipes on those systems, but it seems that FreeBSD and MacOS have both a default pipe buffer size of 64KiB:https://www.netmeister.org/blog/ipcbufs.html

This PR uses 256KiB, not 64KiB on M1 mac (16K page).

methane mentioned this pull request

Sep 1, 2024

gh-121313: multiprocessing: change connection buffer size to 64KiB#123559

Merged

Copy link

Member

vstinner commentedSep 2, 2024

The Changelog entry was added to C API category, instead of the Library category.

Copy link

Member

methane commentedSep 2, 2024

Nice catch. I will change the category in#123559.

Labels

None yet

8 participants

Movatterモバイル変換

Uh oh!

gh-121313: Limit the reading size from pipes to their default buffer size on Unix systems#121315

gh-121313: Limit the reading size from pipes to their default buffer size on Unix systems#121315

Uh oh!

Conversation

aplaikner commentedJul 3, 2024• edited by bedevere-appbotLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

ghost commentedJul 3, 2024• edited by ghostLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

bedevere-appbot commentedJul 3, 2024

Uh oh!

bedevere-appbot commentedJul 3, 2024

Uh oh!

bedevere-appbot commentedJul 3, 2024

Uh oh!

cmaloney commentedJul 4, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

aplaikner commentedJul 5, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

cmaloney left a comment• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmaloneyJul 5, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aplaiknerJul 7, 2024

Choose a reason for hiding this comment

Uh oh!

cmaloneyJul 5, 2024

Choose a reason for hiding this comment

Uh oh!

aplaiknerJul 7, 2024

Choose a reason for hiding this comment

Uh oh!

cmaloneyJul 7, 2024

Choose a reason for hiding this comment

Uh oh!

cmaloneyJul 5, 2024

Choose a reason for hiding this comment

Uh oh!

aplaiknerJul 7, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cmaloney commentedJul 7, 2024

Uh oh!

aplaikner commentedJul 7, 2024

Uh oh!

aplaikner commentedJul 29, 2024

Uh oh!

cmaloney commentedJul 29, 2024

Uh oh!

bedevere-bot commentedAug 31, 2024

Uh oh!

gpshead commentedAug 31, 2024

Uh oh!

gpshead commentedAug 31, 2024

Uh oh!

methane commentedAug 31, 2024

Uh oh!

vstinner commentedSep 2, 2024

Uh oh!

methane commentedSep 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

aplaikner commentedJul 3, 2024•
edited by bedevere-appbot
Loading

ghost commentedJul 3, 2024•
edited by ghost
Loading

cmaloney commentedJul 4, 2024•
edited
Loading

aplaikner commentedJul 5, 2024•
edited
Loading

cmaloney left a comment•
edited
Loading

cmaloneyJul 5, 2024•
edited
Loading