Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

GH-119169: Skip reversing sibling directories inos.[f]walk(topdown=False)#119473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed

Conversation

@barneygale
Copy link
Contributor

@barneygalebarneygale commentedMay 23, 2024
edited
Loading

Inos.walk(topdown=False), don't bother reversingwalk_dirs. This means that sibling directories are visited in a different order, but 1) that order is arbitrary and comes fromos.scandir(), and 2) unlike in top-down mode, users can't influence which directories are visited or in what order.

This change causedtest_walk_bottom_up to fail. I think this test made assertions that were too specific and relied onos.scandir() returning things in a specific order. The test code is pretty hard to understand once you get into the details. I've replaced it with a version of the same test fromtest_pathlib_abc.py. The new test checks that every directory is yieldedbefore all its ancestors, andafter all its descendants, but does not check the order of its yieldedsiblings.

…wn=False)`In `os.walk(topdown=False)`, don't bother reversing `walk_dirs`. This meansthat sibling directories are visited in a different order, but 1) thatorder is arbitrary and comes from `os.scandir()`, and 2) unlike in top-downmode, users can't influence which directories are visited or in what order.This change caused `test_walk_bottom_up` to fail. I think this test madeassertions that were too specific and relied on `os.scandir()` returningthings in a specific order, and the test code is pretty hard to understandonce you get into the details. I've replaced it with a version of the sametest from `test_pathlib_abc.py`.
@barneygalebarneygale changed the titleGH-119169: Skip reversing sibling directories inos.walk(topdown=False)GH-119169: Skip reversing sibling directories inos.[f]walk(topdown=False)May 30, 2024
@barneygale
Copy link
ContributorAuthor

Hey@serhiy-storchaka, just for context: this change doesn't provide much of a speedup in itself, but it enables more significant speedups seen inGH-119186. I've separated it out here because it's an observable change that breaks a test. But I suspect that doesn't matter as thescandir() result order is arbitrary and the order of sibling visits can't be influenced. What do you think? Thank you!

@serhiy-storchaka
Copy link
Member

Ifos.walk(topdown=False) reverses the order of visiting directories, this should be fixed.

@barneygale
Copy link
ContributorAuthor

It presently visits sibling directories in the same order they appear indirnames.

What I'm proposing here is that we don'tneed to visit sibling directories in the same order they appear indirnames in bottom-up mode, because:

  1. The order is arbitrary (it comes fromos.scandir())
  2. Users aren't able to influence the order mid-walk by re-orderingdirnames (this is only possible in top-down mode)

Perhaps a worked example would be helpful? I can put one together.

@barneygale
Copy link
ContributorAuthor

barneygale commentedJul 8, 2024
edited
Loading

Here's an example, first with the currentmain:

>>>os.mkdir('mydir')>>>os.mkdir('mydir/subdir1')>>>os.mkdir('mydir/subdir2')>>>forentryinos.walk('mydir',topdown=False):...print(entry)...     ('mydir/subdir2', [], [])('mydir/subdir1', [], [])('mydir', ['subdir2','subdir1'], [])

And with this PR:

>>>forentryinos.walk('mydir',topdown=False):...print(entry)...     ('mydir/subdir1', [], [])('mydir/subdir2', [], [])('mydir', ['subdir2','subdir1'], [])

Note how the entry forsubdir1 is now yielded beforesubdir2, and how this no longer corresponds to the order indirnames formydir.

If we think the new behaviour is acceptable (e.g. because we never guaranteed the order of sibling visits) then it unlocks an optimization: we can add paths to the stackwhile scanning directories rather than in a separate loop afterwards.

@serhiy-storchaka
Copy link
Member

I think that it would be confusing ifos.walk(),grep.grep(), or manual recursive use ofos.listdir() andos.scandir() produce different results. The order depends on the file system, but at least it is consistent.

The fact that the current code needs to reverse the lists of directories is just an implementation detail. It is not exposed to the user. If the code useddeque instead of a list, it would not need to reverse them.

BTW, how much it makes difference? What it in comparison of usingdeque?

barneygale reacted with thumbs up emoji

@barneygale
Copy link
ContributorAuthor

That makes sense, thanks.

BTW, how much it makes difference?

Taking into account the speedups inGH-119186 andGH-121433, it's about 1% foros.walk() and 6% foros.fwalk(). Nothing special.

What it in comparison of using deque?

I'll get back to you.

@barneygale
Copy link
ContributorAuthor

Hm, thedeque would help with a breadth-first implementation, but I don't think it helps here without alot of code changes. I'll close this PR and its two dependents.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@serhiy-storchakaserhiy-storchakaAwaiting requested review from serhiy-storchaka

Assignees

No one assigned

Labels

awaiting core reviewperformancePerformance or resource usageskip news

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@barneygale@serhiy-storchaka

[8]ページ先頭

©2009-2025 Movatter.jp