Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-101196: Make isdir/isfile/exists faster on Windows#101324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
zooba merged 32 commits intopython:mainfrommdboom:win-isdir-fastpath
Feb 8, 2023

Conversation

@mdboom
Copy link
Contributor

@mdboommdboom commentedJan 25, 2023
edited
Loading

This makesisdir,isfile andexists faster on Windows, by making fewer API calls than callingos.stat and then looking at that result.

On the following microbenchmarks:

Microbenchmarks
import os.pathimport timeitfor i in range(100):    os.makedirs(f"exists{i}", exist_ok=True)    os.symlink(f"exists{i}", f"symlink{i}", target_is_directory=True)for i in range(100):    with open(f"file{i}.txt", "w") as fd:        fd.write('x' * 100)    os.symlink(f"file{i}.txt", f"file_symlink{i}.txt")def test_isdir_exists():    isdir = os.path.isdir    for i in range(100):        isdir(f"exists{i}")def test_isdir_symlink():    isdir = os.path.isdir    for i in range(100):        isdir(f"symlink{i}")def test_isdir_missing():    isdir = os.path.isdir    for i in range(100):        isdir(f"missing{i}")def test_isfile_exists():    isfile = os.path.isfile    for i in range(100):        isfile(f"file{i}.txt")def test_isfile_symlink():    isfile = os.path.isfile    for i in range(100):        isfile(f"file_symlink{i}.txt")def test_isfile_missing():    isfile = os.path.isfile    for i in range(100):        isfile(f"missing{i}")def test_exists_exists():    exists = os.path.exists    for i in range(50):        exists(f"file{i}.txt")        exists(f"exists{i}")def test_exists_symlink():    exists = os.path.exists    for i in range(50):        exists(f"file_symlink{i}.txt")        exists(f"symlink{i}")def test_exists_missing():    exists = os.path.exists    for i in range(100):        exists(f"missing{i}")print("isdir exists:", timeit.timeit(test_isdir_exists, number=200))print("isdir symlink:", timeit.timeit(test_isdir_symlink, number=200))print("isdir missing:", timeit.timeit(test_isdir_missing, number=200))print("isfile exists:", timeit.timeit(test_isfile_exists, number=200))print("isfile symlink:", timeit.timeit(test_isfile_symlink, number=200))print("isfile missing:", timeit.timeit(test_isfile_missing, number=200))print("exists exists:", timeit.timeit(test_exists_exists, number=200))print("exists symlink:", timeit.timeit(test_exists_symlink, number=200))print("exists missing:", timeit.timeit(test_exists_missing, number=200))for i in range(100):    os.unlink(f"symlink{i}")    os.unlink(f"file_symlink{i}.txt")for i in range(100):    os.rmdir(f"exists{i}")for i in range(100):    os.unlink(f"file{i}.txt")

I get the following results:

benchmarkmainthis PRratio
isdir exists27.3 us23.0 us0.84
isdir symlink38.0 us33.1 us0.87
isdir missing9.1 us6.5 us0.72
isfile exists20.7 us16.7 us0.81
isfile symlink32.4 us28.1 us0.87
isfile missing9.0 us6.7 us0.74
exists exists24.1 us19.2 us0.8
exists symlink35.6 us29.4 us0.83
exists missing9.0 us6.5 us0.72

So therefore, these operations are around 13%-28% faster, depending on whether the file exists, is a symlink etc.

NOTE: This does not improve the performance ofpathlib.Path.is_dir and friends. The behavior there is slightly different in terms of error handling, but I think a similar approach could also be applied there. If this PR is acceptable, I plan to do that as follow-up work.

@eryksun
Copy link
Contributor

A builtin_islink() implementation would open the path using the flagsFILE_FLAG_OPEN_REPARSE_POINT | FILE_FLAG_BACKUP_SEMANTICS and get theFileAttributeTagInfo. It's a symlink if the query succeeds andReparseTag isIO_REPARSE_TAG_SYMLINK. Note thatIO_REPARSE_TAG_MOUNT_POINT (i.e. a junction) is intentionally excluded fromislink().

@eryksun
Copy link
Contributor

A few cases that are handled bywin32_xstat_impl() should be supported. Perhaps a common function could be implemented that factors out the initial open fromwin32_xstat_impl() andattributes_from_dir() -- e.g.win32_stat_open(const wchar_t *path, BOOL traverse, WIN32_FIND_DATAW *find_data, BOOL *find_data_updated). Or perhaps simply fall back onwin32_xstat_impl() ifCreateFileW() fails with one of the error codes discussed below.


  • ERROR_ACCESS_DENIED andERROR_SHARING_VIOLATION: Fall back on usingFindFirstFileW(). Internally, this opens the parent directory viaNtOpenFile() and queries a directory entry for the name usingNtQueryDirectoryFileEx(). The query must fail or return false if the directory entry is a reparse point, unless it's a name-surrogate reparse point (e.g. a symlink or junction) that was supposed to be opened. Here are some cases where this can occur:
    • The file security may grant no access to the caller. Microsoft's filesystems implement a policy that always grantsFILE_READ_ATTRIBUTES access if the caller is grantedFILE_READ_DATA access to the parent directory. However,CreateFileW() also requestsSYNCHRONIZE access.
    • No access is granted to a file that's mapped as a system paging file.
    • No access is granted to a file or directory that has been deleted but is not unlinked, which is the case if there are existing opens and POSIX delete isn't supported by
      • the OS version (e.g. Windows 8),
      • the API function (e.g.RemoveDirectoryW),
      • or the filesystem driver (e.g. exFAT and FAT32).
  • ERROR_CANT_ACCESS_FILE: Traversing a reparse point in a path may not be supported because it was created by software on another system (e.g. on a portable drive), or the driver/service that supports the reparse point is no longer installed or running, or it was simply never intended for automatic traversal (e.g.IO_REPARSE_TAG_APPEXECLINK). In this case, the open should be retried with the flagFILE_FLAG_OPEN_REPARSE_POINT. If it turns out to be a name-surrogate reparse point, the query should be failed with the original error code,ERROR_CANT_ACCESS_FILE, as a broken symlink(ish) file.
  • ERROR_INVALID_PARAMETER: A device open may requireGENERIC_READ access. For example, opening "\\.\CON" (i.e. "\Device\ConDrv\Console") requires eitherGENERIC_READ orGENERIC_WRITE access. The open has to be retried with the requiredGENERIC_READ access.

@mdboom
Copy link
ContributorAuthor

A few cases that are handled by win32_xstat_impl() should be supported.

I guess this means we don't have test coverage for them...?

In any event, I agree we should probably try to cover these cases. I'll see what can be done.

Or perhaps simply fall back on win32_xstat_impl() if CreateFileW() fails with one of the error codes discussed below.

My initial implementation from this morning did that, which I like because it doesn't repeat what's clearly battle-tested code, but it made thingsslower in the event of symlinks. I was almost ok with that given that that isn't the common case, but your solution seemed to make everything a little faster (ignoring these new corner cases).

@eryksun
Copy link
Contributor

eryksun commentedJan 25, 2023
edited
Loading

Or perhaps simply fall back on win32_xstat_impl() if CreateFileW() fails with one of the error codes discussed below.

My initial implementation from this morning did that, which I like because it doesn't repeat what's clearly battle-tested code, but it made thingsslower in the event of symlinks.

There shouldn't be an issue with speed for common cases as long as you only fall back onwin32_xstat_impl() for the discussed error codes:

  • ERROR_ACCESS_DENIED
  • ERROR_SHARING_VIOLATION
  • ERROR_CANT_ACCESS_FILE
  • ERROR_INVALID_PARAMETER

@unittest.skipIf(sys.platform!='win32',"drive letters are a windows concept")
deftest_isfile_driveletter(self):
current_drive=os.path.splitdrive(os.path.abspath(__file__))[0]+"\\"
self.assertFalse(os.path.isfile(current_drive))
Copy link
Contributor

@eryksuneryksunJan 26, 2023
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The reason this is false is because a relative drive path like "C:" resolves to the working directory on the drive. Did you want to test a volume device path liker'\\.\C:' instead?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Ah, yes. And this is exactly the thing that (erroneously) returnsTrue without checking the error code fromGetFileInformationByHandleEx.

@mdboom
Copy link
ContributorAuthor

mdboom commentedJan 26, 2023
edited
Loading

@eryksun: Thanks for all of your help and pointers.

I've implemented support for these corner cases by callingwin32_stat under the hood, which is not the most efficient but avoids duplicating the complex logic inwin32_xstat_impl.

I'm a little stuck on writing unit tests for them. I was able to coverERROR_ACCESS_DENIED by extending an existing test forstat to also ensure thatisfile matches it.

I'm not sure how to recreate the case you describe inERROR_CANT_ACCESS_FILE.

ForERROR_INVALID_PARAMETER, I triedisfile(r"\\.\CON") andisdir(r"\\.\CON"), but I get the same result (False) with or without this error handling as well as onmain. So maybe I need to somehow create or haveCON?

@mdboommdboom requested a review fromeryksunJanuary 26, 2023 14:57
Copy link
Member

@zoobazooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks fine to me. Any other POVs?

@erlend-aasland
Copy link
Contributor

Looks good to me, but I'll add that there's a lot of code duplication in here; it might be worth it to refactor the code shortly after landing this PR, so the added code is more maintainable and readable.

@mdboom
Copy link
ContributorAuthor

Looks good to me, but I'll add that there's a lot of code duplication in here; it might be worth it to refactor the code shortly after landing this PR, so the added code is more maintainable and readable.

I'll take a first pass at reducing some of this duplication.

erlend-aasland reacted with thumbs up emoji

@zooba
Copy link
Member

My read of the duplication is that there's just enough difference that it's probably not worth it. The main change I'd like to see (but wasn't requiring) is to add an option topath_t for Argument Clinic to let it fall through in the case of errors, rather than raising.

If all the additional functions were using the same OS call, it would make sense to merge them all. But they're subtly different each time, and since this is for perf, we don't want to introduce more switches.

erlend-aasland reacted with thumbs up emoji

@erlend-aasland
Copy link
Contributor

erlend-aasland commentedFeb 6, 2023
edited
Loading

The main change I'd like to see (but wasn't requiring) is to add an option topath_t for Argument Clinic to let it fall through in the case of errors, rather than raising.

+1 (I suggest adding that in a follow-up PR; let's land this.)

If all the additional functions were using the same OS call, it would make sense to merge them all. But they're subtly different each time, and since this is for perf, we don't want to introduce more switches.

True that.

@eryksun
Copy link
Contributor

Michael, are you still working on refactoring the code? Steve and Erlend seem to be in agreement that it's ready to merge as is. I think it's ready.

erlend-aasland reacted with thumbs up emoji

@zooba
Copy link
Member

Just heard that it's good to go as it is.

erlend-aasland reacted with rocket emoji

@zoobazooba merged commit86ebd5c intopython:mainFeb 8, 2023
@zoobazooba added needs backport to 3.10only security fixes needs backport to 3.11only security fixes labelsFeb 8, 2023
@miss-islington
Copy link
Contributor

Thanks@mdboom for the PR, and@zooba for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Thanks@mdboom for the PR, and@zooba for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry,@mdboom and@zooba, I could not cleanly backport this to3.10 due to a conflict.
Please backport usingcherry_picker on command line.
cherry_picker 86ebd5c3fa9ac0fba3b651f1d4abfca79614af5f 3.10

@miss-islington
Copy link
Contributor

Sorry@mdboom and@zooba, I had trouble checking out the3.11 backport branch.
Please retry by removing and re-adding the "needs backport to 3.11" label.
Alternatively, you can backport usingcherry_picker on the command line.
cherry_picker 86ebd5c3fa9ac0fba3b651f1d4abfca79614af5f 3.11

@zooba
Copy link
Member

This ought to be safe to backport from a public API perspective, but I'll let the bot figure out how easy it'll be.

@zoobazooba added needs backport to 3.11only security fixes and removed needs backport to 3.11only security fixes needs backport to 3.10only security fixes labelsFeb 8, 2023
@miss-islington
Copy link
Contributor

Thanks@mdboom for the PR, and@zooba for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry,@mdboom and@zooba, I could not cleanly backport this to3.11 due to a conflict.
Please backport usingcherry_picker on command line.
cherry_picker 86ebd5c3fa9ac0fba3b651f1d4abfca79614af5f 3.11

@zooba
Copy link
Member

Okay, it's not going to backport trivially :) If anyone's up for it, go ahead, otherwise I'll take a look when I get time.

@hauntsaninja
Copy link
Contributor

(It's relatively rare that we backport performance improvements, so if it ends up being complicated maybe we shouldn't)

erlend-aasland reacted with thumbs up emoji

carljm added a commit to carljm/cpython that referenced this pull requestFeb 9, 2023
* main: (82 commits)pythongh-101670: typo fix in PyImport_ExtendInittab() (python#101723)pythonGH-99293: Document that `Py_TPFLAGS_VALID_VERSION_TAG` shouldn't be used. (#pythonGH-101736)  no-issue: Add Dong-hee Na as the cjkcodecs codeowner (pythongh-101731)pythongh-101678: Merge math_1_to_whatever() and math_1() (python#101730)pythongh-101678: refactor the math module to use special functions from c11 (pythonGH-101679)pythongh-85984: Remove legacy Lib/pty.py code. (python#92365)pythongh-98831: Use opcode metadata for stack_effect() (python#101704)pythongh-101283: Version was just released, so should be changed in 3.11.3 (pythonGH-101719)pythongh-101283: Fix use of unbound variable (pythonGH-101712)pythongh-101283: Improved fallback logic for subprocess with shell=True on Windows (pythonGH-101286)pythongh-101277: Port more itertools static types to heap types (python#101304)pythongh-98831: Modernize CALL and family (python#101508)pythonGH-101696: invalidate type version tag in `_PyStaticType_Dealloc` (python#101697)pythongh-100221: Fix creating dirs in `make sharedinstall` (pythonGH-100329)pythongh-101670: typo fix in PyImport_AppendInittab() (pythonGH-101672)pythongh-101196: Make isdir/isfile/exists faster on Windows (pythonGH-101324)pythongh-101614: Don't treat python3_d.dll as a Python DLL when checking extension modules for incompatibility (pythonGH-101615)pythongh-100933: Improve `check_element` helper in `test_xml_etree` (python#100934)pythonGH-101578: Normalize the current exception (pythonGH-101607)pythongh-47937: Note that Popen attributes are read-only (python#93070)  ...
@hauntsaninjahauntsaninja removed the needs backport to 3.11only security fixes labelFeb 25, 2023
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@eryksuneryksuneryksun left review comments

@AlexWaygoodAlexWaygoodAlexWaygood left review comments

@zoobazoobazooba approved these changes

@erlend-aaslanderlend-aaslanderlend-aasland approved these changes

@ethanfurmanethanfurmanAwaiting requested review from ethanfurman

Assignees

@zoobazooba

Labels

3.12only security fixes

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

8 participants

@mdboom@eryksun@bedevere-bot@erlend-aasland@zooba@miss-islington@hauntsaninja@AlexWaygood

[8]ページ先頭

©2009-2025 Movatter.jp