Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-127647: Add typing.Reader and Writer protocols#127648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
JelleZijlstra merged 37 commits intopython:mainfromsrittau:typing-readable-writable
Mar 6, 2025

Conversation

srittau
Copy link
Contributor

@srittausrittau commentedDec 5, 2024
edited by github-actionsbot
Loading

@srittau
Copy link
ContributorAuthor

A few design considerations:

  • I used the namesReader andWriter to match existing ABC names intyping/collections.abc. Alternatives areReadable andWritable, but I think they are used more rarely for these kind of (pseudo-)protocols.SupportsX would work forWriter, but not forReader, since the latter supports multiple methods. (Any we maybe want to useSupportsRead etc. at some point for tighter protocols.)
  • I would prefer these protocols to be inio, where they fit better thematically, but that would require importingtyping with unforseeable (performance) implications for such a foundational module.
  • I deliberated using tighter protocols, but for ease of use reasons – and since they are mostly an alternative to usingIO et al. – I went with larger protocols for now.
  • Seekable (with methodsseek andtell) would be an interesting addition as well, but I think this should wait for easy protocol composition.

@srittau
Copy link
ContributorAuthor

Copy link
Member

@picnixzpicnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm a bit sad that we can't use covariance and contravariance in the protocols since a subclass could use it with an invariant type.

Small improvements to the docstrings and signature

Protocol for reading from a file or other input stream.

.. method:: read(size=...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
..method::read(size=...)
..method::read(size=..., /)

(Same for other methods)

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I was unsure of how much of the signature to document. For example, would it also make sense to add argument and/or return types? I've added the slashes for now.

@AlexWaygood
Copy link
Member

I would prefer these protocols to be inio, where they fit better thematically, but that would require importingtyping with unforseeable (performance) implications for such a foundational module.

io feels like a more natural home to me too. Could you not do something similar to the pseudo-protocol ABCs incollections.abc,contextlib andos.path? The approach there is:

  • Make the classes ABCs rather than protocols
  • But document them as protocols
  • And write__subclasshook__ methods so that they behave identically to runtime-checkable Protocols in theirisinstance()/issubclass() behaviour:
    @classmethod
    def__subclasshook__(cls,C):
    ifclsisAbstractContextManager:
    return_collections_abc._check_methods(C,"__enter__","__exit__")
    returnNotImplemented
  • And special-case the ABCs intyping.py so that thetyping module treats them identically to protocols:

    cpython/Lib/typing.py

    Lines 1940 to 1948 in023b7d2

    _PROTO_ALLOWLIST= {
    'collections.abc': [
    'Callable','Awaitable','Iterable','Iterator','AsyncIterable',
    'AsyncIterator','Hashable','Sized','Container','Collection',
    'Reversible','Buffer',
    ],
    'contextlib': ['AbstractContextManager','AbstractAsyncContextManager'],
    'os': ['PathLike'],
    }
  • And pretend in typeshed that they're protocols, so that type checkers treat them as protocols (I feel likepython/typeshed@f554f54 missed the point there slightly)

@srittau
Copy link
ContributorAuthor

That feels terribly hackish to me, even if there is precedence. And in my experience, hackish stuff often falls on one's feet one way or the other. But I can move it toio if that's the consensus.

@AlexWaygood
Copy link
Member

This has been the way thecollections.abc andcontextlib ABCs have always worked. As far as I'm aware, it's worked pretty well up to now. And it doesn't actually feel like the most hackish part ofcollections.abc ;)

_sys.modules['collections.abc']=_collections_abc
abc=_collections_abc

@srittau
Copy link
ContributorAuthor

Well, considering thattyping.is_protocol uses the_is_protocol marker to determine whether something is a protocol, I guess that's the official way to mark protocols at runtime, not the fact that they are derived fromProtocol. Whichcontradicts the typing spec at the moment. This all seems a bit precarious to me.

@AlexWaygood
Copy link
Member

I'm open to something less hacky if you can find a generalized solution that doesn't make_collections_abc depend ontyping (which would significantly slow down startup for all Python applications, since_collections_abc is imported as part of Python's initialisation).

@AlexWaygood
Copy link
Member

considering thattyping.is_protocol uses the_is_protocol marker to determine whether something is a protocol, I guess that's the official way to mark protocols at runtime, not the fact that they are derived fromProtocol. Whichcontradicts the typing spec at the moment.

I don't think the typing spec needs to be cognisant of the hacks we use to make things work at runtime in the stdlib! This knowledge is irrelevant for third-party users of protocols, to users of type checkers and to implementers of type checkers. It's definitely not encouraged for any external users to make use of the_is_protocol attribute.

JelleZijlstra reacted with thumbs up emoji

@srittau
Copy link
ContributorAuthor

CI seems flaky, failures are unrelated.

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
srittauand others added2 commitsFebruary 27, 2025 13:08
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Copy link
Member

@AlexWaygoodAlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This LGTM now but it might be good to have one or two simple tests covering the new code. It probably doesn't need to be much more than this (added totest_io.py)

deftest_supports_index(self):
self.assertIsSubclass(int,typing.SupportsIndex)
self.assertNotIsSubclass(str,typing.SupportsIndex)

and maybe an addition to thetest_typing tests similar to this one:

deftest_collections_protocols_allowed(self):
@runtime_checkable
classCustom(collections.abc.Iterable,Protocol):
defclose(self): ...
classA:pass
classB:
def__iter__(self):
return []
defclose(self):
return0
self.assertIsSubclass(B,Custom)
self.assertNotIsSubclass(A,Custom)
@runtime_checkable
classReleasableBuffer(collections.abc.Buffer,Protocol):
def__release_buffer__(self,mv:memoryview)->None: ...
classC:pass
classD:
def__buffer__(self,flags:int)->memoryview:
returnmemoryview(b'')
def__release_buffer__(self,mv:memoryview)->None:
pass
self.assertIsSubclass(D,ReleasableBuffer)
self.assertIsInstance(D(),ReleasableBuffer)
self.assertNotIsSubclass(C,ReleasableBuffer)
self.assertNotIsInstance(C(),ReleasableBuffer)

Copy link
Member

@AlexWaygoodAlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is great, thank you! Just a few docs nits remaining

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
@AlexWaygood
Copy link
Member

AlexWaygood commentedFeb 27, 2025
edited
Loading

I'll leave it a little bit in case any of the other reviewers here have any remaining concerns before merging

__slots__ = ()

@abc.abstractmethod
def read(self, size=..., /):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sorry if this has been discussed before, but I'm unsure on the runtime use ofsize=... (I didn't notice this earlier in my documentation review, sorry).

Almost every otherread(size) method I can find has a default of eitherNone or-1. I also can't find another method in the stdlib with a default of... (outside of the recently-added protocols inwsgiref.types).

Would it be better to havesize=-1, to indicate that the method takes anint? I'm not sure how much we want typeshed-like practices to leak into the standard library.

Suggested change
defread(self,size=...,/):
defread(self,size=-1,/):

A

Copy link
ContributorAuthor

@srittausrittauFeb 27, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Mandating defaults is not really something you can do in a protocol. I also wouldn't want to mandate that implementors have to use a default of-1, because – as you said – some implementations useNone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Right, but this isn't a protocol -- it's an ABC, which do have defaults -- see e.g.collections.abc.Generator. All of theread() methods inio are documented as havingread(size=-1, /), and given these ABCs are going intoio, I think we should be consistent with that interface, or have good reason to diverge from it (& document why).

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It's supposed to be a protocol, not an ABC. (Notwithstanding the fact that all protocols are ABCs.) It's just a protocol in the implementation for performance reasons. And using-1 as a default would give users the very wrong impression that they can useread(-1) when that may or may not actually be supported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

using-1 as a default would give users the very wrong impression that they can useread(-1) when that may or may not actually be supported.

From the documentation ofio.RawIOBase.read():

Read up tosize bytes from the object and return them. As a convenience, ifsize is unspecified or -1, all bytes until EOF are returned.

I would expect theio.Reader.read() ABC/protocol to have this same guarantee, for a 'properly' implementedread() method (according to theio expecations). In the proposed documentation, we say:

Read data from the input stream and return it. Ifsize is specified, it should be an integer, and at most size items (bytes/characters) will be read.

This forbidsNone (good!), but is silent on what happens shouldsize be omitted. I still think we should use-1 instead of..., but at the very least we should include in the documentation the contract for what happens whenread() is called with no arguments.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I disagree.-1 is the default forsome implementations, but the protocol should not make any default mandatory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm still not sure I follow, though -- currently the ABC/protocol has the mandatory default ofsize=..., which is always invalid and not the correct type.-1 is valid for all interfaces specified by theio documentation, which is where this new type is being added.

I ran the following withmypy --strict and it passed, so I don't think type checkers care about default values (and as discussed, the type forbids using non-integer types):

importabcclassReader(metaclass=abc.ABCMeta):__slots__= ()@abc.abstractmethoddefread(self,size:int=-1,/)->bytes:passclassCustomReader(Reader):defread(self,size:int=-1,/)->bytes:returnb''classCustomReaderZero(Reader):defread(self,size:int=0,/)->bytes:returnb''assertissubclass(CustomReader,Reader)assertissubclass(CustomReaderZero,Reader)assertisinstance(CustomReader(),Reader)assertisinstance(CustomReaderZero(),Reader)defreader_func(r:Reader)->None:r.read()reader_func(CustomReader())reader_func(CustomReaderZero())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I agree with Sebastian here; we should use... because the protocol need not mandate any particular default.

-1 is valid for all interfaces specified by the io documentation

The protocol should also match other file-like classes defined elsewhere in the stdlib or even in third-party libraries. When defining a protocol it's often useful to be permissive, so that all objects that are intended to match the protocol actually match it.

@srittau
Copy link
ContributorAuthor

Could this be merged before the next alpha next week?

@JelleZijlstraJelleZijlstra merged commitc6dd234 intopython:mainMar 6, 2025
39 checks passed
@srittausrittau deleted the typing-readable-writable branchMarch 6, 2025 16:02
@bluetech
Copy link

I only saw the update now, great that it's merged!

I have a few comments. I hope they're not too bothersome - I just think these protocols are important so it's worth the effort.

Can the docs include type annotations? I know that Sphinx supports it. Given that this is meant for static typing, it makes sense to me to include.

I think it is important to document the expected semantics of these methods, otherwise you don't really know what you're going to get when you accept aReader orWriter. What I can think of:

  • What should happen on EOF?
  • Is EOF sticky?
  • Short reads/writes
  • What does it mean to omitsize?
  • What happens onsize 0?
  • Issize allowed to be -1 or more generally negative?
  • Canread return value beNone? Empty?
  • Canwrite return value be 0?
  • Expected exceptions --IOError?

I don't like the parameter namesize, because from the caller perspective it is not exactly a size of anything. I would have preferred justn. But, I see this name is already used e.g.IOBase and consistency probably wins the argument here.

The docs say "If size is specified, it should be an integer, and at most size items (bytes/characters) will be read" and "Write data to the output stream and return the number of items (bytes/characters) written". I would drop the "(bytes/characters)" sinceT is not necessarily that.

The docs say "The following protocols can be used for annotating function and method arguments for simple stream reading or writing operations". I have two nitpicks:

  • What is annotated is the parameter, not the argument (= a value for a parameter in a specific invocation). So would change "arguments" -> "parameters".
  • I don't think it is just for simple operations, so I'd remove "simple".

I thinkT should have some bound. I think at leastSized, it's hard to interpret the return value without a notion of size.

@srittau
Copy link
ContributorAuthor

A merged PR is probably not the best place to discuss this, but a few points:

Can the docs include type annotations?

Personally, I'd be fine with, and actually prefer if the docs used type annotations more, but I didn't want to divert from the existing style in the documentation.

I think it is important to document the expected semantics of these methods

While it would make sense to me to document some of those semantics (e.g. size must be>= 1), most are not part of this protocol. Exceptions and EOF are not concepts that are understood by those protocols.

I would drop the "(bytes/characters)" since T is not necessarily that.

It might have been clearer to say "(e.g. bytes or characters)", but I think calling out those specific types here makes the documentation clearer, as in 99% of cases it will be either of those. Otherwise the concept of "item" is a bit nebulous.

I think T should have some bound.

I don't think it's necessary to make the protocol more complex. Whoever is using the protocol will have to choose an type that they can work with.

AlexWaygood and JelleZijlstra reacted with thumbs up emoji

@srittau
Copy link
ContributorAuthor

Also in general, protocols are not a replacement for API documentation. It doesn't make too much sense to include expectations around a protocol that can't be checked statically. To document these expectations is the responsibility of the author of a function that uses these protocols, and to follow them is the responsibility of the user. These expectations can vary significantly from function to function.

AlexWaygood reacted with thumbs up emoji

@bluetech
Copy link

(Please feel free to ignore this if you don't want to discuss this further -- I couldn't resist posting another comment...)

If I understand correctly, you prefer a narrow approach to these protocols because 1) they can't be checked statically 2) semantics can vary between uses. My replies:

  1. In an ideal world, it would be possible to convey all semantics in a static way. But if it's not possible, documentation is the alternative. Stated differently, the Platonic ideal of theReader interface contains all of its (agreed) semantics, but its projection onto the Python type system is incomplete, so the documentation should try its best to fill the gap. That's how I view it at least.

  2. Why should the semantics vary between uses? A single set of semantics has a big advantage - it reduces (N impls * M uses) coordination points to N + M. As an implementer, I can (implicitly or explicitly) follow the protocol and don't need to invent or document my semantics. As a user, I write to a single set of semantics, not needing to handle multiple sets, defensively protect against unknown ones, or document my expectations. As a "middleware" (e.g. aBufferedReader which adds buffering to any reader), I can take any generic reader. What's the advantage of allowing implementations to be inconsistent on something like how EOF is signaled?

srittau added a commit to srittau/typing_extensions that referenced this pull requestApr 9, 2025
seehwan pushed a commit to seehwan/cpython that referenced this pull requestApr 16, 2025
mdboom added a commit to mdboom/cpython that referenced this pull requestApr 25, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@sobolevnsobolevnsobolevn left review comments

@AA-TurnerAA-TurnerAA-Turner left review comments

@picnixzpicnixzpicnixz left review comments

@loic-simonloic-simonloic-simon left review comments

@JelleZijlstraJelleZijlstraJelleZijlstra approved these changes

@AlexWaygoodAlexWaygoodAlexWaygood approved these changes

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

8 participants
@srittau@AlexWaygood@bluetech@JelleZijlstra@sobolevn@AA-Turner@picnixz@loic-simon

[8]ページ先頭

©2009-2025 Movatter.jp