Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-89727: Fix os.walk to handle late editing of dirs#100703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
jonburdo wants to merge14 commits intopython:mainfromjonburdo:os-walk-dirs-editing

Conversation

@jonburdo
Copy link
Contributor

@jonburdojonburdo commentedJan 3, 2023
edited
Loading

Allow modification of the dirs returned by anos.walk entry to affect which subdirectories are walked, even when the modification occurs after iteration over dirs has begun. This was the behavior for a long time before it was changed in#100703

@jonburdojonburdo changed the titleadopt old os.walk behavior to handle changes to dirs list after itera…gh-89727: Fix os.walk to handle late editing of dirsJan 3, 2023
@jonburdo
Copy link
ContributorAuthor

jonburdo commentedJan 3, 2023
edited
Loading

to do:

  • clean up code
  • add some tests

@jonburdo
Copy link
ContributorAuthor

TheBytesWalkTests.walk andBytesFwalkTests.walk wrapper functions actually prevent late modification (after iteration ofdirs has started) because of the encoding/decoding logic they use which setsdirs before iterating over it. These are of course only for testing purposes, but it means that if we want to runtest_walk_late_dirs_modification in them as well as inWalkTests, we'd need an alternatewalk wrapper which this test uses and some logic to handle the bytes results in theBytes... classes - maybe not worth it?

Lib/os.py Outdated
Comment on lines 360 to 364
# We may not have read permission for top, in which case we can't
# get a list of the files the directory contains.
# We suppress the exception here, rather than blow up for a
# minor reason when (say) a thousand readable directories are still
# left to visit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This comment belongs with thetry: scandir block, which is moved from here and now exists in three separate locations.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Good point. It can probably just go on the first instance of this block

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Actually in the latest commit where this block appears in separate functions, I decided to just remove this comment. The behavior is described in the docstring and I think the try-except block is pretty clear. Happy to add it or part of it back if deemed necessary

Lib/os.py Outdated
ifcont:
continue

iftopdown:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Such a high percentage of code in this function is now under eitherif topdown or the reverse that I'm kind of curious what it would look like to just have separate (internal) functions for topdown vs bottom-up. But this would still increase code duplication significantly, and move the duplicated code/structure further apart, so I suspect it's still better this way.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I was actually thinking about this. I just gave it a shot to see. This seems to me like one of those cases where there are enough little differences in logic that it's much cleaner to separate the two. If we didn't support dir modification for topdown or didn't care about performance it might be simpler, but when you have a lot of the same logic interspersed by a lot of little differences it gets messy. In these kinds of situations I also tend to prefer some duplication between functions over one big function with conditions inside of loops.

I also find it easier to look at the two sets of logic separately, but either way works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yeah, I think the split version seems fine. Will wait a bit and see if some core devs have opinions.

jonburdo reacted with thumbs up emoji
@jonburdo
Copy link
ContributorAuthor

A few more notes for context:

Thefirst commit has more minimal changes similar tothis suggestion. In that commit, because we're using an iterator, the implementation is a bit awkward and inefficient (i.e. only ever callingnext(dirs) once per loop). There's two cases in which we ignore the value given bynext(dirs) and move on:

  1. Withtopdown=True, thefollowlinks or not islink(top) check fails
  2. The linescandir_it = scandir(top) throws anOSError which is caught (and not re-raised byonerror)
    The subsequent changes handle these more naturally by looping through the iterator instead of callingcontinue to go through the outer loop again.

The subsequent changes also separate the top down and bottom up logic further (even before splitting into separate functions). Note that even before this PR, there was a fair amount of code that only applied when we havetopdown=False:

  • Theif isinstance(top, tuple): call at the top of the outer loop
  • walk_dirs creation andif isinstance(top, tuple): inside thescandir_it loop
  • if topdown: block at the bottom

@zmievsa
Copy link
Contributor

zmievsa commentedJan 6, 2023
edited
Loading

Do we really want to introduce this much complexity for a completely undocumented and possibly unused feature?

barneygale reacted with thumbs up emoji

@jonburdo
Copy link
ContributorAuthor

Do we really want to introduce this much complexity for a completely undocumented and possibly unused feature?

Maybe not. I made this PR to see what it would look like, but am not attached to it. If I were designingos.walk from scratch, I wouldn't support this late editing of dirs - at least not in this way where the list can be edited during its iteration. I think this could be reasonably considered behavior that produces an undefined result, or simply considered a bug, although it was never documented as such.

But I strongly suspect it is used out there (maybe even by accident), and consistent behavior is valuable. If we don't support this, it'd be nice to have some sort of warning about the behavior change but I also don't see a reasonable way to do that. (Checking if dirs was modified would be complex or expensive too and kind of weird).

I'd suggest at least noting the change in behavior in release notes if we don't support the old behavior. I also do prefer separating the top down and bottom up logic as in this PR, although that could be a separate matter.

Other considerations:

  • Complexity (or at least code change) is a bit less in the first commit here if that seems any better.
  • I don't like the idea of copying this mess over topathlib.Path.walk, and I also don't like the idea of having different behavior, especially if this is considered a bug or undefined behavior. It's also useful to keep the code similar between the two for maintenance, understanding logic, future changes, etc

@barneygale
Copy link
Contributor

I don't like the idea of copying this mess over to pathlib.Path.walk, and I also don't like the idea of having different behavior, especially if this is considered a bug or undefined behavior. It's also useful to keep the code similar between the two for maintenance, understanding logic, future changes, etc

pathlib's version ofwalk() was mergedonly because it improved the ergonomics ofos.walk(), including handling symlinks to directories more consistently, which isn't possible inos.walk() for backwards compatibility.

Mutatingdirnames after resuming the walk isn't included or even hinted at in the pathlib docs so I don't want to expend any effort supporting it.

zmievsa and jonburdo reacted with thumbs up emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@carljmcarljmcarljm left review comments

Assignees

No one assigned

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

6 participants

@jonburdo@zmievsa@barneygale@carljm@bedevere-bot@erlend-aasland

[8]ページ先頭

©2009-2025 Movatter.jp