NotificationsYou must be signed in to change notification settings
Fork33.4k
Star69.8k

gh-120754: Refactor I/O modules to stash whole stat result rather than individual members#123412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

vstinner merged 7 commits intopython:mainfromcmaloney:cmaloney/stat_atopen

Sep 18, 2024

Merged

gh-120754: Refactor I/O modules to stash whole stat result rather than individual members#123412

vstinner merged 7 commits intopython:mainfromcmaloney:cmaloney/stat_atopen

Sep 18, 2024

Conversation

Copy link

Contributor

cmaloney commentedAug 28, 2024

As I was working ongh-120754 I noticed I kept adding more members and copying out individual members from thefstat call, and that it may be simpler / easier to just stash (and invalidate) the whole stat result rather than individul members. This is preparatory work for

Avoid callingisatty on open for regular files (ResolvingAvoid calling isatty() for most open() calls #90102)
Reduce system calls by making more members available (Helping implementSpeed up open().read() pattern by reducing the number of system calls #120754)

One important note, and why the member is calledstat_atopen is that the values should only be used as guidance / an estimate. With individual members copied out this was also the case. While it's common for a file to not be modified while python code is reading it, other processes could interact with it and code needs to handle that. Two examples of this that I've come across: It is possible to change anfd soisatty result changes (see:gh-90102 andGH-121941) and afd which is opened in blocking mode may have anioctl used on it to change it to non-blocking (see:gh-109523). The general class of bugs here are commonly called time-of-check to time of use (TOCTOU,https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use)

Given how common some specific patterns are (ex.Path().read_bytes()) it is still worthwhile to optimize those (Ex. disabling buffering results in a over 10% speedup in that case,GH-122111). The existing codepaths treated this correctly as far as I can tell.

This PR is a portion ofGH-121593 which is being split up into smaller, hopefully easier to review chunks. Not callingisatty for regular files makes a small but measurable perf improvement for every "open and read whole regular file" python does.

cmaloney added2 commits

August 27, 2024 16:53

pythongh-120754: Refactor _io to stash whole stat

508aa9d

Multiple places in the I/O stack optimize common cases by using theinformation from stat. Currently individual members are extracted fromthe stat and stored into the fileio struct. Refactor the code to storethe whole stat struct instead.

pythongh-120754: Refactor _pyio to stash whole stat

9d849ce

Parallels the changes to _io. The `stat` Python object doesn't allowchanging members, so rather than modifying estimated_size, just clearthe value.

bedevere-appbot mentioned this pull request

Aug 28, 2024

Speed up open().read() pattern by reducing the number of system calls#120754

Closed

bedevere-appbot added the awaiting review label

Aug 28, 2024

cmaloney mentioned this pull request

Aug 28, 2024

GH-120754: Remove isatty call during regular open#121593

Closed

Copy link

ContributorAuthor

cmaloney commentedAug 28, 2024

Could this get the no news tag? (This is changing / refactoring an implementation detail)

picnixz added the skip news label

Aug 28, 2024

vstinner reviewed

Aug 28, 2024

View reviewed changes

Lib/_pyio.py OutdatedShow resolvedHide resolved

Modules/_io/fileio.c OutdatedShow resolvedHide resolved

Lib/_pyio.pyShow resolvedHide resolved

Modules/_io/fileio.cShow resolvedHide resolved

Lib/_pyio.py OutdatedShow resolvedHide resolved

cmaloneyand others added4 commits

August 28, 2024 17:20

Apply suggestions from code review

bfcfcf2

Co-authored-by: Victor Stinner <vstinner@python.org>

Add comments around stat_atopen and why bufsize is + 1

3122665

Apply review changes for _pyio _blkszie

8f5cfe4

Fix comment formatting

d18a82d

vstinner approved these changes

Aug 29, 2024

View reviewed changes

Copy link

Member

vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM.

@gpshead @serhiy-storchaka @pitrou: Would you mind to have a look?

Lib/_pyio.pyShow resolvedHide resolved

bedevere-appbot added awaiting merge and removed awaiting review labels

Aug 29, 2024

Add _pyiio +1 comment to fileio for better clarity

c55d10e

vstinner merged commit8b6c7c7 intopython:main

Sep 18, 2024

bedevere-appbot removed the awaiting merge label

Sep 18, 2024

Copy link

Member

vstinner commentedSep 18, 2024

Ok, I merged your change. Thanks for your contribution. Let's see how it goes :-)

cmaloney deleted the cmaloney/stat_atopen branch

September 18, 2024 19:04

Copy link

ContributorAuthor

cmaloney commentedSep 18, 2024

Looking at individual buildbots, seeing sometest_io refleaks failures (https://buildbot.python.org/#/builders/259/builds/1384,https://buildbot.python.org/#/builders/551/builds/78), digging in a bit.

zware mentioned this pull request

Sep 18, 2024

Fixmake htmllive target#124219

Merged

Copy link

Member

vstinner commentedSep 18, 2024

Using test.bisect_cmd, I identified the leaking test:

$ ./python -m test test_io -R 3:3 -m test.test_io.CIOTest.test_fileio_closefd(...)test_io leaked [1, 1, 1] memory blocks, sum=3(...)

Copy link

Member

vstinner commentedSep 18, 2024

Looking at individual buildbots, seeing some test_io refleaks failures (https://buildbot.python.org/#/builders/259/builds/1384,https://buildbot.python.org/#/builders/551/builds/78), digging in a bit.

I wrote a fix: PRgh-124225.

gpshead reviewed

Sep 18, 2024

View reviewed changes

Modules/_io/fileio.c

		get_blksize(fileioself,voidclosure)
		{
		#ifdefHAVE_STRUCT_STAT_ST_BLKSIZE
		if (self->stat_atopen!=NULL&&self->stat_atopen->st_blksize>1) {

Copy link

Member

gpsheadSep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I do wonder how realistic the st_blksize values, when available, are for performance purposes, I guess we'll find out.

Copy link

Member

vstinnerSep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This PR should not change the buffer size, does it?

Copy link

ContributorAuthor

cmaloneySep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

#117151 (comment) investigatedst_blksize a bit previously. This PR I tried not to change buffer size at all / just change how it is accessed.

Have with the refactors + optimizations been watching for new issues. Are finding some as people testmain (ex.gh-113977 which I wrote a primary fix for#122101, and have more fix ideas on top of the stat_atopen changes)

savannahostrowski pushed a commit to savannahostrowski/cpython that referenced this pull request

Sep 22, 2024

pythongh-120754: Refactor I/O modules to stash whole stat result rath…

b97ec9c

…er than individual members (python#123412)Multiple places in the I/O stack optimize common cases by using theinformation from stat. Currently individual members are extracted fromthe stat and stored into the fileio struct. Refactor the code to storethe whole stat struct instead.Parallels the changes to _io. The `stat` Python object doesn't allowchanging members, so rather than modifying estimated_size, just clearthe value.

This was referencedOct 3, 2024

gh-90102: Remove isatty call during regular open#124922

Merged

gh-117151: IO performance improvement, increase io.DEFAULT_BUFFER_SIZE to 128k#118144

Merged

Labels

skip news

Movatterモバイル変換

Uh oh!

gh-120754: Refactor I/O modules to stash whole stat result rather than individual members#123412

gh-120754: Refactor I/O modules to stash whole stat result rather than individual members#123412

Uh oh!

Conversation

cmaloney commentedAug 28, 2024

Uh oh!

cmaloney commentedAug 28, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vstinner commentedSep 18, 2024

Uh oh!

cmaloney commentedSep 18, 2024

Uh oh!

vstinner commentedSep 18, 2024

Uh oh!

vstinner commentedSep 18, 2024

Uh oh!

gpsheadSep 18, 2024

Choose a reason for hiding this comment

Uh oh!

vstinnerSep 18, 2024

Choose a reason for hiding this comment

Uh oh!

cmaloneySep 18, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants