Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

GH-127456: pathlib ABCs: add protocol for path parser#127494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
barneygale merged 12 commits intopython:mainfrombarneygale:gh-127456
Dec 9, 2024

Conversation

barneygale
Copy link
Contributor

@barneygalebarneygale commentedDec 1, 2024
edited
Loading

Change the default value ofPurePathBase.parser fromParserBase() toposixpath. As a result, user subclasses ofPurePathBase andPathBase use POSIX path syntax by default, which is very often desirable.

Movepathlib._abc.ParserBase topathlib._types.Parser, and convert it to a runtime-checkable protocol.

No user-facing changes as the pathlib ABCs are still private.

`posixpath`. As a result, user subclasses of `PurePathBase` and `PathBase`use POSIX path syntax by default, which is very often desirable.
…ies.Objects of this type provide a subset of the `os.DirEntry` API,specifically those methods and attributes needed to implement `glob()` and`walk()`.
Objects of this type provide a small subset of the `os.stat_result` API,specifically attributes for the file type, permissions and location/offset.
@encukou
Copy link
Member

Penny for your thoughts on this@encukou :)

This looks reasonable, butStatResult looks rather useless with its three mandatory but unreliable members:

  • st_mode has UNIX permission bits and aplatform-specific representation of file type (since AFAIK, POSIX doesn't specify the values ofS_IFREG et.al.)
  • st_ino &st_dev, which can be 0 if the info is not available -- if it weren't for the tuple API, the attributes could just as well be missing in this case.

Also, I guess typing has no way to say that e.g. ifst_birthtime is present, it needs to be a time in seconds.

It seems that for your goal of supporting virtual filesystems, it might be better to rely onDirEntry, withstat remaining as a backend for filesystems where it's useful.
I wonder if we need a FS-agnostic version ofst_ino/st_dev, like:

def key(self) -> Hashable:    """Return a comparison key. Entries with the same key refer to the same file."""    # For a Unix-like FS:    ...    return (stat_result.st_dev, stat_result.st_ino)
barneygale and zooba reacted with thumbs up emoji

@barneygale
Copy link
ContributorAuthor

barneygale commentedDec 3, 2024
edited
Loading

Thank you for the feedback and reviews.

It seems that for your goal of supporting virtual filesystems, it might be better to rely onDirEntry, withstat remaining as a backend for filesystems where it's useful. I wonder if we need a FS-agnostic version ofst_ino/st_dev, like:

def key(self) -> Hashable:    """Return a comparison key. Entries with the same key refer to the same file."""    # For a Unix-like FS:    ...    return (stat_result.st_dev, stat_result.st_ino)

I've occasionally thought about adding aPathBase.status() abstract method, which could return aDirEntry-like object with caching methods. Methods likePathBase.is_dir() could then be implemented viaself.status().is_dir(). If we did that, we could either deletePathBase.stat(), or implement it in terms ofself.status(). Would that be better?

I don't thinkDirEntry is an appropriate return type forstatus(), because we're not scanning the parent directory, rather we're querying the path directly. Perhaps it's just a naming problem.

@barneygalebarneygale marked this pull request as ready for reviewDecember 3, 2024 19:21
barneygaleand others added2 commitsDecember 3, 2024 20:06
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
@encukou
Copy link
Member

I don't think DirEntry is an appropriate return type for status(), because we're not scanning the parent directory, rather we're querying the path directly. Perhaps it's just a naming problem.

Status, then? Likestat, but not a C-style abbr.
OrFileInfo?

barneygale reacted with thumbs up emoji

@zooba
Copy link
Member

I'm not a fan of thestat() model at all. It's the best of a bad lot, I know, but I'd rather have some kind of API that lets us use cached information foris_file()/is_dir()/etc. rather than having to guarantee that it's fresh for each call.

I'm not sure what such an API should look like, apart from I want it to have thePath object interface, and be using cached responses about the state of the path (or fetching them and caching on demand). There are more than enough properties that might as well be cached that it's worth it, and the TOCTOU issues basically all exist even with our current model.

barneygale reacted with thumbs up emoji

@encukou
Copy link
Member

Brainstorming:
Add aFileInfo protocol, implemented by classes that hold file info for a path. All the information in it is cached; the cache(s) might be initialized when an attribute is accessed (this canany attribute -- e.g. accessing the creation time might cache all ofstat()), or when theFileInfo is created. The cache(s) can't be cleared (externally,FileInfo is immutable; but you can make a fresh instance).
Paths would get acached_property-likeinfo attribute, initialized to aFileInfo either on first access or when the Path is created (e.g. to hold extra info from iterdir).
As withcached_property, the cache can be cleared usingdel path.info.
We'd need anew_info() method as well, to hand out fresh info but leave the cache alone -- mainly as a backend for the current Path methods.

A downside is that this breaks the rule that touching the filesystem is done by methods, not attributes. (Can we say rule was for getting “fresh” information, and we can use something else here?)

barneygale and zooba reacted with thumbs up emoji

@barneygale
Copy link
ContributorAuthor

This all sounds great to me.

A downside is that this breaks the rule that touching the filesystem is done by methods, not attributes.

Why would accessingPath.info touch the filesystem? Couldn't make it so that onlyFileInfo.is_*() methods can perform FS access?

zooba reacted with thumbs up emoji

@barneygale
Copy link
ContributorAuthor

barneygale commentedDec 5, 2024
edited
Loading

Oh, I suppose you might want tolstat() orstat() so thatFileInfo always represents something that exists/existed. Otherwise we'd need aFileInfo.exists() method, and if we're intending to storeos.DirEntry objects asPath.info when paths are generated fromiterdir(), then we'd need to add anos.DirEntry.exists() method too.

@barneygale
Copy link
ContributorAuthor

Courtesy CCing@ncoghlan, who proposed something similarhere. I shot it down then because it wasn't clear to me when to call[l]stat(), how to clear caches, and how it interacts withPath.is_*(). I'm also a tiny bit uneasy makingPath stateful.

@zooba
Copy link
Member

(Process note - we should probably move the API design discussion off this PR)

exists() shouldn't be an issue, because we can cache that asTrue orFalse at creation time (e.g. returned fromglob() implies thatexists() == True at some point in the recent past; created by a user implies thatexists() is unknown and needs to make a filesystem call). The same could potentially be true for any other state, most obviouslyis_dir() andis_file(), but also any of the regular stat attributes (where a subclass could well return their own type from.stat()).

What we need to design is the way for a user to discover that they've got cached values, and reset it. ACachedInfo mixin (just for type check purposes) and a dedicated subclass (so that e.g.Path(p) would recreate and uncache all members) would seem to suffice, though I can also see the reasoning for adding a new member to manage it.

@barneygale
Copy link
ContributorAuthor

@barneygalebarneygale changed the titleGH-127456: pathlib ABCs: add protocols for supporting typesGH-127456: pathlib ABCs: add protocol for path parserDec 7, 2024
@barneygale
Copy link
ContributorAuthor

I've removedDirEntry andStatResult from this PR, so it now covers only onlyParser.

@raymondiiii

This comment was marked as spam.

Copy link
Member

@zoobazooba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The reduced PR to just the parser changes LGTM

encukou reacted with thumbs up emoji
@barneygale
Copy link
ContributorAuthor

Thanks both, very much appreciate the feedback :-)

@barneygalebarneygaleenabled auto-merge (squash)December 9, 2024 18:08
@barneygalebarneygale merged commit5c89adf intopython:mainDec 9, 2024
35 of 36 checks passed
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull requestJan 8, 2025
…27494)Change the default value of `PurePathBase.parser` from `ParserBase()` to`posixpath`. As a result, user subclasses of `PurePathBase` and `PathBase`use POSIX path syntax by default, which is very often desirable.Move `pathlib._abc.ParserBase` to `pathlib._types.Parser`, and convert itto a runtime-checkable protocol.Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@picnixzpicnixzpicnixz left review comments

@nineteendonineteendonineteendo left review comments

@zoobazoobazooba approved these changes

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

6 participants
@barneygale@encukou@zooba@raymondiiii@picnixz@nineteendo

[8]ページ先頭

©2009-2025 Movatter.jp