NotificationsYou must be signed in to change notification settings
Fork33.3k
Star69.8k

GH-120754: Remove isatty call during regular open#121593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Closed

cmaloney wants to merge5 commits intopython:mainfromcmaloney:cmaloney/stash_fstat_pr

Closed

GH-120754: Remove isatty call during regular open#121593

cmaloney wants to merge5 commits intopython:mainfromcmaloney:cmaloney/stash_fstat_pr

Conversation

Copy link

Contributor

cmaloney commentedJul 10, 2024•
edited by bedevere-appbot
Loading

For POSIX, TTYs are never regular files, so if the interpreter knows the file is regular it doesn't need to do an additional system call to check if the file is a TTY.

Theopen() Python builtin requires astat call at present in order to ensure the file being opened isn't a directory. That result includes the file mode which tells us if it is a regular file. There are a number of attributes from the stat which are stashed one off currently, move to stashing the whole object rather than just individual members.

The stat object is reasonably large and currently thestat_result.st_size member cannot be modified from Python, which is needed by the_pyio implementation, so make the whole stat object optional. In the_io implementation this makes handling a stat failure simpler. At present there is no explicit user call to clear it, but if one is needed (ex. a program which has a lot of open FileIO objects and the memory becomes a problem) it would be straightforward to add. Ideally would be able to automatically clear (the values are generally used during I/O object initialization and not after. After awrite they are no longer useful in current cases).

It is fairly common pattern to scan a directory, look at thestat results (ex. is this file changed), and then open/read the file. In this PR I didn't update open's API to allow passing in a stat result to use, but that could be beneficial for some cases (ex.importlib).

With this change on my Linux machine reading a small plain text file is down to 6 system calls.

openat(AT_FDCWD,"read_one.py",O_RDONLY|O_CLOEXEC)=3fstat(3, {st_mode=S_IFREG|0644,st_size=87, ...})=0lseek(3,0,SEEK_CUR)=0read(3,"from pathlib import Path\n\npath ="...,88)=87read(3,"",1)=0close(3)=0

Performance

On my Mac:

python bm_readall.py

importpyperffrompathlibimportPathdefread_all(all_paths):forpinall_paths:p.read_bytes()defread_file(path_obj):path_obj.read_text()all_rst=list(Path("Doc").glob("**/*.rst"))all_py=list(Path(".").glob("**/*.py"))assertall_rst,"Should have found rst files"assertall_py,"Should have found python source files"runner=pyperf.Runner()runner.bench_func("read_file_small",read_file,Path("Doc/howto/clinic.rst"))runner.bench_func("read_file_large",read_file,Path("Doc/c-api/typeobj.rst"))runner.bench_func("read_all_rst",read_all,all_rst)runner.bench_func("read_all_py",read_all,all_py)

main

.....................read_file_small: Mean +- std dev: 7.89 us +- 0.06 us.....................read_file_large: Mean +- std dev: 21.1 us +- 0.3 us.....................read_all_rst: Mean +- std dev: 4.17 ms +- 0.07 ms.....................read_all_py: Mean +- std dev: 19.3 ms +- 0.2 ms

cmaloney/stash_fstat

.....................read_file_small: Mean +- std dev: 7.47 us +- 0.05 us.....................read_file_large: Mean +- std dev: 20.5 us +- 0.3 us.....................read_all_rst: Mean +- std dev: 3.92 ms +- 0.07 ms.....................read_all_py: Mean +- std dev: 18.4 ms +- 0.2 ms

On Linux the performance change is in the noise on my machine. For both MacOS and Linux I suspect removing the remaininglseek will provide a bit more performance swing based on profiles, but it also touches very differently shaped code than the system call removals so far.

Issue:Speed up open().read() pattern by reducing the number of system calls #120754

pythonGH-120754: Remove isatty call during regular read

36e9061

For POSIX, TTYs are never regular files, so if the interpreter knows thefile is regular it doesn't need to do an additional system call to checkif the file is a TTY.The `open()` Python builtin requires a `stat` call at present in orderto ensure the file being opened isn't a directory. That result includesthe file mode which tells us if it is a regular file. There are a numberof attributes from the stat which are stashed one off currently, move tostashing the whole object rather than just individual members.The stat object is reasonably large and currently the`stat_result.st_size` member cannot be modified from Python, which isneeded by the `_pyio` implementation, so make the whole stat objectoptional. In the `_io` implementation this makes handling a statfailure simpler. At present there is no explicit user call to clear it,but if one is needed (ex. a program which has a lot of open FileIOobjects and the memory becomes a problem) it would be straightforward toadd. Ideally would be able to automatically clear (the values aregenerally used during I/O object initialization and not after. After a`write` they are no longer useful in current cases).It is fairly common pattern to scan a directory, look at the `stat`results (ex. is this file changed), and then open/read the file. In thisPR I didn't update open's API to allow passing in a stat result to use,but that could be beneficial for some cases (ex. `importlib`).With this change on my Linux machine reading a small plain text file isdown to 6 system calls.```pythonopenat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0lseek(3, 0, SEEK_CUR)                   = 0read(3, "from pathlib import Path\n\npath ="..., 88) = 87read(3, "", 1)                          = 0close(3)                                = 0```

bedevere-appbot mentioned this pull request

Jul 10, 2024

Speed up open().read() pattern by reducing the number of system calls#120754

Closed

bedevere-appbot added the awaiting review label

Jul 10, 2024

cmaloneyand others added2 commits

July 10, 2024 13:26

WASI sets st_blksize to 0, update _pyio to handle. _io already handled

a51e8ee

Merge branch 'main' into cmaloney/stash_fstat_pr

bc55828

cmaloney changed the title~~GH-120754: Remove isatty call during regular read~~GH-120754: Remove isatty call during regular open

Jul 10, 2024

Copy link

ContributorAuthor

cmaloney commentedJul 10, 2024

The WASI and x86 Windows failures are around large readall on a large zip file, working on figuring out what I changed / broke for those cases.

Read must be smaller than _PY_READ_MAX

991fb01

cmaloney marked this pull request as draft

July 11, 2024 05:14

bedevere-appbot removed the awaiting review label

Jul 11, 2024

Zipimport test relies on having a size > PY_READ_MAX meaning auto-exp…

a487c6b

…anding buffer

Copy link

Member

serhiy-storchaka commentedJul 11, 2024

cmaloney commentedJul 11, 2024•
edited
Loading

@serhiy-storchaka
Cool! Hadn't realized there were a couple attempts at that previously. This doesn't change the behavior ofisatty call at all, and the isatty check is only optimized during__init__ so I think is safe in terms of what didn't quite work with#112495 (changes to the underlying file/fd after open; the stat + don't call isatty is inside the same single library call in this PR)

I don't have a strong preference around my "is regular" check vs. S_ISCHR (#112495) vs. Size. If there's one which is best to use or a combo happy to implement that.

cmaloney marked this pull request as ready for review

July 11, 2024 07:25

bedevere-appbot added the awaiting review label

Jul 11, 2024

Copy link

ContributorAuthor

cmaloney commentedJul 11, 2024•
edited
Loading

The test_zipimport highlighted a couple issues to me which I think should be solved, but separately from this (which keeps the behavior from before this PR):

If the size is large, then this code with slight hard to notice/read changes (which does break buildbots) allocates larger than the max byte string size

cpython/Objects/bytesobject.c

Lines 86 to 90 in690b935

	if ((size_t)size> (size_t)PY_SSIZE_T_MAX-PyBytesObject_SIZE) {
	PyErr_SetString(PyExc_OverflowError,
	"byte string is too large");
	returnNULL;
	}

vs.

cpython/Include/internal/pycore_fileutils.h

Lines 65 to 76 in690b935

	#if defined(MS_WINDOWS)\|\| defined(__APPLE__)
	/* On Windows, the count parameter of read() is an int (bpo-9015, bpo-9611).
	On macOS 10.13, read() and write() with more than INT_MAX bytes
	fail with EINVAL (bpo-24658). */
	# define_PY_READ_MAX INT_MAX
	# define_PY_WRITE_MAX INT_MAX
	#else
	/* write() should truncate the input to PY_SSIZE_T_MAX bytes,
	but it's safer to do it ourself to have a portable behaviour */
	# define_PY_READ_MAX PY_SSIZE_T_MAX
	# define_PY_WRITE_MAX PY_SSIZE_T_MAX
	#endif

This is worked around currently by "if the size is large, always use a dynamically resized buffer" and the end reads end up being small (ex. 1 byte fortest_largefile). It feels like that's pointing out two things which I would like to solve with a separate PRs (This keeps the same behavior around PY_READ_MAX + estimated size as before this PR)

It would be good to have a test intest_fileio around this behavior, rather than relying on other modules. In specific, on a very large / max size file a seek to near the end of the file then do a.read(). Also truncate the file then do a read.
The zipfile module relies on read without a size after seeking to a certain distance from the end of the file, and discards if size doesn't match what it expects exactly. That would work more efficiently by doing a fixed-size read

In the CPython codebase doing a seek after open then doing a readall / read without size occurs multiple times, which likely means it is common other places as well
1. Should improve behavior for that (Likely either seek and truncate should invalidate the cached size or some form of general "if large readall must call fstat to get up to date info")
2. Should add a test that we don't over-allocate which can lead to an OOM on smaller memory boxes in these cases (is what broke the bot ingh-120754: Update estimated_size in C truncate #121357)
The dynamic buffer resizing doesn't pay attention to the max PyBytes size and should be improved. Currently it tries allocating _PY_READ_MAX which is bigger than the biggest possible bytes resulting in a bytes too large failure).

cmaloney mentioned this pull request

Jul 21, 2024

gh-113977, gh-120754: Remove unbounded reads from zipfile#122101

Merged

cmaloney added a commit to cmaloney/cpython that referenced this pull request

Jul 21, 2024

pythonGH-120754: Add more tests around seek + readall

0c48c43

In the process of speeding up readall, A number of related tests(ex. large file tests in test_zipfile) found problems with thechange I was making. This adds I/O tests to specifically test thesecases to help ensure they don't regress and hopefully make debuggingeasier.This is part of the improvements frompython#121593 (comment)

cmaloney mentioned this pull request

Jul 21, 2024

GH-120754: Add more tests around seek + readall#122103

Merged

Copy link

ContributorAuthor

cmaloney commentedJul 21, 2024

Created PRs to improve

zipfile to not do unbounded reads --gh-113977, gh-120754: Remove unbounded reads from zipfile #122101
explicit I/O tests around seek + readall on a large file (and comment on the existing truncate + readall) --GH-120754: Add more tests around seek + readall #122103

hauntsaninja pushed a commit that referenced this pull request

Jul 24, 2024

GH-120754: Add more tests around seek + readall (#122103)

9eb7341

In the process of speeding up readall, A number of related tests(ex. large file tests in test_zipfile) found problems with thechange I was making. This adds I/O tests to specifically test thesecases to help ensure they don't regress and hopefully make debuggingeasier.This is part of the improvements from#121593 (comment)

nohlson pushed a commit to nohlson/cpython that referenced this pull request

Jul 24, 2024

pythonGH-120754: Add more tests around seek + readall (python#122103)

aae7d34

In the process of speeding up readall, A number of related tests(ex. large file tests in test_zipfile) found problems with thechange I was making. This adds I/O tests to specifically test thesecases to help ensure they don't regress and hopefully make debuggingeasier.This is part of the improvements frompython#121593 (comment)

nohlson pushed a commit to nohlson/cpython that referenced this pull request

Jul 24, 2024

pythonGH-120754: Add more tests around seek + readall (python#122103)

72e6ce0

In the process of speeding up readall, A number of related tests(ex. large file tests in test_zipfile) found problems with thechange I was making. This adds I/O tests to specifically test thesecases to help ensure they don't regress and hopefully make debuggingeasier.This is part of the improvements frompython#121593 (comment)

cmaloney mentioned this pull request

Aug 7, 2024

Unbounded reads byzipfile may cause aMemoryError.#113977

Closed

cmaloney mentioned this pull request

Aug 25, 2024

Avoid calling isatty() for most open() calls#90102

Closed

vstinner reviewed

Aug 26, 2024

View reviewed changes

Copy link

Member

vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This PR is hard to review since it changes multiple things at once. Would it be possible to only introduce "stat_atopen" and use it to get the block size? (Create a firstsmaller PR.)

cmaloney mentioned this pull request

Aug 28, 2024

gh-120754: Refactor I/O modules to stash whole stat result rather than individual members#123412

Merged

Copy link

ContributorAuthor

cmaloney commentedAug 28, 2024

@vstinner I have opened a new PRGH-123412 which contains just the refactor tostat_atopen member, memory allocation management of that new member. It has two commits, one for_io and a second for_pyio, trying to keep those in sync but allowing the changes to hopefully be looked at side to side. I'll work on a new PR to skip checkingisatty for common regular files on top of that.