Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

GH-77609: Add follow_symlinks argument topathlib.Path.glob()#102616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged

Conversation

barneygale
Copy link
Contributor

@barneygalebarneygale commentedMar 12, 2023
edited
Loading

Add a keyword-onlyfollow_symlinks parameter topathlib.Path.glob() andrglob().

Whenfollow_symlinks isNone (the default), these methods follow symlinks except when evaluating "**" wildcards. When set to true or false, symlinks are always or never followed, respectively.

This allows us to addressGH-102613 in future.

Add a keyword-only *follow_symlinks* parameter to `pathlib.Path.glob()` and`rglob()`, defaulting to false. When set to true, symlinks to directoriesare followed as if they were directories.Previously these methods followed symlinks except when evaluating "`**`"wildcards; on Windows they returned paths in filesystem casing except whenevaluating non-wildcard tokens. Both these problems are solved here. Thiswill allow us to addresspythonGH-102613 andpythonGH-81079 in future commits.
@barneygale
Copy link
ContributorAuthor

barneygale commentedMar 13, 2023
edited
Loading

Table showing whethersymlinks to directories are followed:

Literal (e.g.foo/)Wildcard (e.g.foo*/)Recursive wildcard (**/)
glob() (previous behaviour)
glob()
glob(follow_symlinks=True)

Table showing whetherfilesystem case is returned:

Literal (e.g.foo/)Wildcard (e.g.foo*/)Recursive wildcard (**/)
glob() (previous behaviour)
glob()

@zooba
Copy link
Member

I wrote a fairly long post about my concerns on the changed behaviour, but then I realised that mymain question is whether the symlink following changes applies to the entire path or only the bit inpattern? If symlinks in the base path are going to be followed regardless of the option, I'm far less concerned.

However, I'd like to be able to figure this out from the documentation :) So there's a change required there to clarify (and maybe I'll be more concerned if it clarifies the wrong way)

@barneygale
Copy link
ContributorAuthor

Symlinks in the base path (thep inp.glob(...)) are followed regardless of the option! I'll beef up the docs.

@barneygale
Copy link
ContributorAuthor

Thanks for the feedback btw!

I'm also nervous about changing behaviour. I could revise this to makefollow_symlinks=None the default, and haveNone mean "follow symlinks except when expanding**" (i.e. current behaviour), but it's a bit icky. Or we could makefollow_symlinks=False still follow symlinks when expanding literal or non-recursive wildcards (again preserving existing behaviour), but it feels like misleading naming to me. Or perhaps we introduce a zsh-like*** token that follows symlinks. I'm not totally sure.

@zooba
Copy link
Member

I don't want to add***. I think we can live without that.

I think the behaviour of path parts before any wildcard are easily assumed to be equivalent of being part of the base path. That is:

>>> Path("a1/b2").glob("c3/**")# is the same as>>> Path("a1/b2/c3").glob("**")

Everything from the first wildcard becomes a match pattern. We need to recurse deep enough to create paths with the same number of segments, and then match all the candidates against the pattern. Since we'renow recursing, choosing not to follow symlinks by default is justifiable.

Given the current behaviour is to follow"spam*" but not** (that is, it's inconsistent), choosing one or the other feels more like a bugfix. I think we can describe it as "directory symlinks matched by using a wildcard will only be followed if the flag is set", whereas currently it depends on context, and a symlink may or may not be followed.

I'm pretty much leaning towards needing a deprecation period though. I think that'll be safest, and probably necessary, so the best way to do that is to support=None for the option right now and plan to change it toFalse in 3.14. Then we can show a deprecation warning when it's used. Hopefully there's a fairly separate way to implement the mixed behaviour that makes it easy to see what to remove later on.

barneygale reacted with thumbs up emoji

@barneygale
Copy link
ContributorAuthor

Right! I think I agree thatleading literals should always be followed.

What about trailing literals, e.g.path.glob('**/.ci/tox.ini')? There's a nice optimization we can do in these cases: we walk the directory treeonce when we hit the** token, and then filter the results through a regex to match the.ci/tox.ini parts. This avoids us needing toscandir() the same directory multiple times, or needing to keep aset to de-duplicate results.

... but that optimization only works if the** token walks the same directories as the.ci/tox.ini literal part would hit. If we treat symlinks inconsistently the whole thing breaks down D:

@barneygale
Copy link
ContributorAuthor

Also agree on the need to preserve existing behaviour and emit deprecation warnings before we change defaults. I'd argue the default should becometrue, notfalse, in future. It's the more useful behaviour.glob.glob()only supports following symlinks!

@barneygale
Copy link
ContributorAuthor

I've switched the default tofollow_symlinks=None, which preserves previous behaviour.

PassingNone to a boolean parameter wounds me. Wild idea: introduce new functions with better naming?

  • Path.expand(pattern, *, follow_symlinks=True) eventually replacesglob().
  • Path.search(pattern, *, follow_symlinks=True) eventually replacesrglob()

I mention this only because the names "glob" and "rglob" are a bit opaque unless you've done shell scripting before. The suggested names align withPurePath.match(). But feel free to shoot this right back down!

@zooba
Copy link
Member

PassingNone to a parameter is fine, especially since we're going to document it as deprecated and you shouldn't do it explicitly.

... but that optimization only works if the** token walks the same directories as the.ci/tox.ini literal part would hit. If we treat symlinks inconsistently the whole thing breaks down D:

Yeah, and so if we have**/.*/tox.ini we won't be able to follow symlinks matching.* because we never followed them in the first place. We can only match real directories that start with a dot.

Maybe we can optimise the case where the** is followed by a literal? And if there are any other wildcard characters (and an unspecifiedfollow_symlinks) we revert to a slower algorithm?

barneygale reacted with thumbs up emoji

@barneygale
Copy link
ContributorAuthor

barneygale commentedMar 14, 2023
edited
Loading

Maybe we can optimise the case where the ** is followed by a literal? And if there are any other wildcard characters (and an unspecified follow_symlinks) we revert to a slower algorithm?

👍 sounds good. As things stand with this PR,follow_symlinks=False andTrue could be optimized, whereasfollow_symlinks=None cannot. I'll leave the optimizations for a future PR as there's a couple other things I need to land first, e.g.#101398.

This PR still breaks backwards compat in two (minor?) ways:

  • ../ tokens in globs no longer navigate to the parent directory. Instead they're matched like other literals among the directory children, and so the match fails.
  • Filesystem case is returned for literal tokens, so if you have "C:/foo/bar.txt", thenPath("C:/").glob("FOO*/BAR.TXT") yieldsPath("C:/foo/bar.txt") and notPath("C:/foo/BAR.TXT").

@zooba
Copy link
Member

Matching filesystem case is an improvement, IMHO, even for back-compat.glob is a query operation, so there should be an expectation that the result comes from the FS and not the arguments.

On the.. case, I assume you mean after a wildcard? Before any wildcard we should be able to interpret it normally, and that is important. But I'm not even sure what behaviour I would expect after a wildcard? Is it meant to meanglob('**/spam/../eggs/*.txt') finds all text files in anyeggs directory provided it has aspam directory next to it? That's... clever... I guess? But I don't think we've ever suggested it would work.

I also don't have any real feeling for whether we shouldnormpath before globbing, or split at wildcards andnormpath, or do something else. What do other tools do with this kind of pattern? Particularly something that "removes" the recursion marker likeglob('**/../')

barneygale reacted with thumbs up emoji

@barneygale
Copy link
ContributorAuthor

barneygale commentedMar 14, 2023
edited
Loading

I also don't have any real feeling for whether we should normpath before globbing, or split at wildcards and normpath, or do something else. What do other tools do with this kind of pattern? Particularly something that "removes" the recursion marker like glob('**/../')

Pathlib uniformly avoids collapsing.. segments by purely lexical means, as it can change the meaning of a path that involves symlinks. So there's no use ofnormpath() anywhere!

I've fixed handling of.., so I think this might now conform with your expectations!

An updated table showing whether symlinks are followed:

Literal (e.g.foo/)Wildcard (e.g.foo*/)Recursive wildcard (**/)
glob() (previous behaviour)
glob()
glob(follow_symlinks=True)
glob(follow_symlinks=False)

Note that, for the moment, literals are handled exactly the same whether they appear before or after recursive wildcards.

zooba reacted with thumbs up emoji

@zooba
Copy link
Member

Looks good. We should still add the deprecation warning forfollow_symlinks=None (I forget if want to skip the warning for 3.12, but even so, add it in and we can comment it out so that at least it's done). The test that uses the default value should suppress the warning.

@barneygale
Copy link
ContributorAuthor

barneygale commentedMar 14, 2023
edited
Loading

Done. But honestly I don't feel great aboutpath.glob('foo') emitting a deprecation warning unless users passfollow_symlinks=False orfollow_symlinks=True. It's wordy and annoying and in many cases (e.g. when users don't expect symlinks), the choice doesn't matter. Also makes it harder for users to write backwards-compatible code. I'd love if there's a better way I'm not seeing.

@barneygale
Copy link
ContributorAuthor

Oh right, I also need to update the docs examples to passfollow_symlinks. I'll do that tomorrow.

@barneygale
Copy link
ContributorAuthor

Alternative PR that adds support for a*** wildcard:

@barneygale
Copy link
ContributorAuthor

Closing this PR as I believe#104176 is the right way forward.

@barneygale
Copy link
ContributorAuthor

@zooba I'd be much more comfortable with this PR if we delayed deprecatingfollow_symlinks=None for a couple of releases. Would that be alright? Otherwise I think users would need to either drop support for 3.11 in their code, or suppress deprecation warnings, right?

@zooba
Copy link
Member

We can document it as deprecated with no planned removal date (or planned change of default, in this case). Or just strongly recommend the use ofTrue orFalse and say thatNone is intended for legacy compatibility only.

If we think (or people start saying) that the default is harmful/a trap, we can deprecate properly at that point (and wait two releases before changing it).

barneygale reacted with thumbs up emoji

@barneygale
Copy link
ContributorAuthor

I'll pick this up again once#104512 lands.

@barneygalebarneygale marked this pull request as ready for reviewMay 23, 2023 23:11
@barneygale
Copy link
ContributorAuthor

In fact, it doesn't really matter whether#104512 lands first. Marking as ready for review!

@barneygale
Copy link
ContributorAuthor

Onfollow_symlinks=None: I don't consider itwrong (after all, Guido considered it correct when he fixed#70200). And at this stage it's no less performant. When I fix#102613 and implement the walk-and-match stategy, I'll add a warning in the docs thatfollow_symlinks=None can be much less performant. Around the Python 3.15 timeframe I'll deprecatefollow_symlinks=None. Hope that sounds alright,@zooba?

@barneygalebarneygale requested a review fromzoobaMay 27, 2023 18:12
@zooba
Copy link
Member

zooba commentedMay 29, 2023
edited
Loading

after all, Guido considered it correct when he fixed#70200

Interesting.

I think if you can clearly explain the different behaviour ("following symlinks matched by** but not other wildcard matches" would do) and make it sound useful (not just preserving compatibility), I don't think there's a need to deprecate it at all unless we want to change the default.

@barneygale
Copy link
ContributorAuthor

I think we're in agreement, then!

The docs in this PR say:

   By default, or when the *follow_symlinks* keyword-only argument is set to   ``None``, this method follows symlinks except when expanding "``**``"   wildcards. Set *follow_symlinks* to ``True`` to always follow symlinks, or   ``False`` to treat all symlinks as files.

Does that sound OK to you?

zooba reacted with thumbs up emoji

@zooba
Copy link
Member

Sounds good to me

@barneygale
Copy link
ContributorAuthor

Mega, thank you. Would you be up for reviewing the PR as a whole, if you have the time? No worries if not, and no urgency from my side.

The patch is somewhat repetitive due to the way globbing is currently implemented. If you find it offensive, we might want to simplify things first via#104512

@barneygale
Copy link
ContributorAuthor

Woot! Thank you Steve!

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@zoobazoobazooba approved these changes

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@barneygale@zooba@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp