Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-120754: Refactor I/O modules to stash whole stat result rather than individual members#123412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
vstinner merged 7 commits intopython:mainfromcmaloney:cmaloney/stat_atopen
Sep 18, 2024

Conversation

@cmaloney
Copy link
Contributor

As I was working ongh-120754 I noticed I kept adding more members and copying out individual members from thefstat call, and that it may be simpler / easier to just stash (and invalidate) the whole stat result rather than individul members. This is preparatory work for

  1. Avoid callingisatty on open for regular files (ResolvingAvoid calling isatty() for most open() calls #90102)
  2. Reduce system calls by making more members available (Helping implementSpeed up open().read() pattern by reducing the number of system calls #120754)

One important note, and why the member is calledstat_atopen is that the values should only be used as guidance / an estimate. With individual members copied out this was also the case. While it's common for a file to not be modified while python code is reading it, other processes could interact with it and code needs to handle that. Two examples of this that I've come across: It is possible to change anfd soisatty result changes (see:gh-90102 andGH-121941) and afd which is opened in blocking mode may have anioctl used on it to change it to non-blocking (see:gh-109523). The general class of bugs here are commonly called time-of-check to time of use (TOCTOU,https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use)

Given how common some specific patterns are (ex.Path().read_bytes()) it is still worthwhile to optimize those (Ex. disabling buffering results in a over 10% speedup in that case,GH-122111). The existing codepaths treated this correctly as far as I can tell.

This PR is a portion ofGH-121593 which is being split up into smaller, hopefully easier to review chunks. Not callingisatty for regular files makes a small but measurable perf improvement for every "open and read whole regular file" python does.

gpshead reacted with thumbs up emoji
Multiple places in the I/O stack optimize common cases by using theinformation from stat. Currently individual members are extracted fromthe stat and stored into the fileio struct. Refactor the code to storethe whole stat struct instead.
Parallels the changes to _io. The `stat` Python object doesn't allowchanging members, so rather than modifying estimated_size, just clearthe value.
@cmaloney
Copy link
ContributorAuthor

Could this get the no news tag? (This is changing / refactoring an implementation detail)

Copy link
Member

@vstinnervstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM.

@gpshead@serhiy-storchaka@pitrou: Would you mind to have a look?

@vstinnervstinner merged commit8b6c7c7 intopython:mainSep 18, 2024
@vstinner
Copy link
Member

Ok, I merged your change. Thanks for your contribution. Let's see how it goes :-)

cmaloney reacted with thumbs up emoji

@cmaloneycmaloney deleted the cmaloney/stat_atopen branchSeptember 18, 2024 19:04
@cmaloney
Copy link
ContributorAuthor

Looking at individual buildbots, seeing sometest_io refleaks failures (https://buildbot.python.org/#/builders/259/builds/1384,https://buildbot.python.org/#/builders/551/builds/78), digging in a bit.

@zwarezware mentioned this pull requestSep 18, 2024
@vstinner
Copy link
Member

Using test.bisect_cmd, I identified the leaking test:

$ ./python -m test test_io -R 3:3 -m test.test_io.CIOTest.test_fileio_closefd(...)test_io leaked [1, 1, 1] memory blocks, sum=3(...)
cmaloney reacted with thumbs up emoji

@vstinner
Copy link
Member

Looking at individual buildbots, seeing some test_io refleaks failures (https://buildbot.python.org/#/builders/259/builds/1384,https://buildbot.python.org/#/builders/551/builds/78), digging in a bit.

I wrote a fix: PRgh-124225.

get_blksize(fileio*self,void*closure)
{
#ifdefHAVE_STRUCT_STAT_ST_BLKSIZE
if (self->stat_atopen!=NULL&&self->stat_atopen->st_blksize>1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I do wonder how realistic the st_blksize values, when available, are for performance purposes, I guess we'll find out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This PR should not change the buffer size, does it?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

#117151 (comment) investigatedst_blksize a bit previously. This PR I tried not to change buffer size at all / just change how it is accessed.

Have with the refactors + optimizations been watching for new issues. Are finding some as people testmain (ex.gh-113977 which I wrote a primary fix for#122101, and have more fix ideas on top of the stat_atopen changes)

gpshead reacted with thumbs up emoji
savannahostrowski pushed a commit to savannahostrowski/cpython that referenced this pull requestSep 22, 2024
…er than individual members (python#123412)Multiple places in the I/O stack optimize common cases by using theinformation from stat. Currently individual members are extracted fromthe stat and stored into the fileio struct. Refactor the code to storethe whole stat struct instead.Parallels the changes to _io. The `stat` Python object doesn't allowchanging members, so rather than modifying estimated_size, just clearthe value.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@gpsheadgpsheadgpshead left review comments

@vstinnervstinnervstinner approved these changes

Assignees

No one assigned

Labels

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@cmaloney@vstinner@gpshead@picnixz

[8]ページ先頭

©2009-2025 Movatter.jp