Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.4k
gh-120754: Refactor I/O modules to stash whole stat result rather than individual members#123412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Multiple places in the I/O stack optimize common cases by using theinformation from stat. Currently individual members are extracted fromthe stat and stored into the fileio struct. Refactor the code to storethe whole stat struct instead.
Parallels the changes to _io. The `stat` Python object doesn't allowchanging members, so rather than modifying estimated_size, just clearthe value.
cmaloney commentedAug 28, 2024
Could this get the no news tag? (This is changing / refactoring an implementation detail) |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Victor Stinner <vstinner@python.org>
vstinner left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM.
@gpshead@serhiy-storchaka@pitrou: Would you mind to have a look?
Uh oh!
There was an error while loading.Please reload this page.
vstinner commentedSep 18, 2024
Ok, I merged your change. Thanks for your contribution. Let's see how it goes :-) |
cmaloney commentedSep 18, 2024
Looking at individual buildbots, seeing some |
vstinner commentedSep 18, 2024
Using test.bisect_cmd, I identified the leaking test: |
vstinner commentedSep 18, 2024
I wrote a fix: PRgh-124225. |
| get_blksize(fileio*self,void*closure) | ||
| { | ||
| #ifdefHAVE_STRUCT_STAT_ST_BLKSIZE | ||
| if (self->stat_atopen!=NULL&&self->stat_atopen->st_blksize>1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I do wonder how realistic the st_blksize values, when available, are for performance purposes, I guess we'll find out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This PR should not change the buffer size, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
#117151 (comment) investigatedst_blksize a bit previously. This PR I tried not to change buffer size at all / just change how it is accessed.
Have with the refactors + optimizations been watching for new issues. Are finding some as people testmain (ex.gh-113977 which I wrote a primary fix for#122101, and have more fix ideas on top of the stat_atopen changes)
…er than individual members (python#123412)Multiple places in the I/O stack optimize common cases by using theinformation from stat. Currently individual members are extracted fromthe stat and stored into the fileio struct. Refactor the code to storethe whole stat struct instead.Parallels the changes to _io. The `stat` Python object doesn't allowchanging members, so rather than modifying estimated_size, just clearthe value.
As I was working ongh-120754 I noticed I kept adding more members and copying out individual members from the
fstatcall, and that it may be simpler / easier to just stash (and invalidate) the whole stat result rather than individul members. This is preparatory work forisattyon open for regular files (ResolvingAvoid calling isatty() for most open() calls #90102)One important note, and why the member is called
stat_atopenis that the values should only be used as guidance / an estimate. With individual members copied out this was also the case. While it's common for a file to not be modified while python code is reading it, other processes could interact with it and code needs to handle that. Two examples of this that I've come across: It is possible to change anfdsoisattyresult changes (see:gh-90102 andGH-121941) and afdwhich is opened in blocking mode may have anioctlused on it to change it to non-blocking (see:gh-109523). The general class of bugs here are commonly called time-of-check to time of use (TOCTOU,https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use)Given how common some specific patterns are (ex.
Path().read_bytes()) it is still worthwhile to optimize those (Ex. disabling buffering results in a over 10% speedup in that case,GH-122111). The existing codepaths treated this correctly as far as I can tell.This PR is a portion ofGH-121593 which is being split up into smaller, hopefully easier to review chunks. Not calling
isattyfor regular files makes a small but measurable perf improvement for every "open and read whole regular file" python does.