NotificationsYou must be signed in to change notification settings
Fork32k
Star67.3k

gh-127647: Add typing.Reader and Writer protocols#127648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

JelleZijlstra merged 37 commits intopython:mainfromsrittau:typing-readable-writable

Mar 6, 2025

Merged

gh-127647: Add typing.Reader and Writer protocols#127648

JelleZijlstra merged 37 commits intopython:mainfromsrittau:typing-readable-writable

Mar 6, 2025

Conversation

Copy link

Contributor

srittau commentedDec 5, 2024•
edited by github-actionsbot
Loading

Issue:Add simpleReader andWriter protocols #127647

📚 Documentation preview 📚:https://cpython-previews--127648.org.readthedocs.build/

Add typing.Reader and Writer protocols

b45fec0

srittau requested review fromJelleZijlstra andAlexWaygood ascode owners

December 5, 2024 18:55

bedevere-appbot mentioned this pull request

Dec 5, 2024

Add simpleReader andWriter protocols#127647

Closed

bedevere-appbot added the awaiting review label

Dec 5, 2024

Copy link

ContributorAuthor

srittau commentedDec 5, 2024

A few design considerations:

I used the namesReader andWriter to match existing ABC names intyping/collections.abc. Alternatives areReadable andWritable, but I think they are used more rarely for these kind of (pseudo-)protocols.SupportsX would work forWriter, but not forReader, since the latter supports multiple methods. (Any we maybe want to useSupportsRead etc. at some point for tighter protocols.)
I would prefer these protocols to be inio, where they fit better thematically, but that would require importingtyping with unforseeable (performance) implications for such a foundational module.
I deliberated using tighter protocols, but for ease of use reasons – and since they are mostly an alternative to usingIO et al. – I went with larger protocols for now.
Seekable (with methodsseek andtell) would be an interesting addition as well, but I think this should wait for easy protocol composition.

srittau added4 commits

December 5, 2024 20:02

Add a note about Iterable

1525e05

Fix docs formatting

7867ec1

Small wording improvements

6a22a02

Simplify the docs/improve formatting

5d632a3

Copy link

ContributorAuthor

srittau commentedDec 5, 2024

https://cpython-previews--127648.org.readthedocs.build/en/127648/library/typing.html#abcs-and-protocols-for-working-with-i-o

sobolevn reviewed

Dec 5, 2024

View reviewed changes

Lib/typing.py OutdatedShow resolvedHide resolved

picnixz reviewed

Dec 5, 2024

View reviewed changes

Copy link

Member

picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm a bit sad that we can't use covariance and contravariance in the protocols since a subclass could use it with an invariant type.

Lib/typing.py OutdatedShow resolvedHide resolved

Doc/library/typing.rst OutdatedShow resolvedHide resolved

Explicitly document the methods

4d50c2e

Small improvements to the docstrings and signature

JelleZijlstra reviewed

Dec 5, 2024

View reviewed changes

Doc/library/typing.rst OutdatedShow resolvedHide resolved

Doc/library/typing.rst Outdated


		Protocol for reading from a file or other input stream.

		.. method:: read(size=...)

Copy link

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	..method::read(size=...)
	..method::read(size=..., /)

(Same for other methods)

Copy link

ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I was unsure of how much of the signature to document. For example, would it also make sense to add argument and/or return types? I've added the slashes for now.

Doc/library/typing.rst OutdatedShow resolvedHide resolved

Lib/typing.py OutdatedShow resolvedHide resolved

Mark protocol members as abstract

1e1ea41

This was referencedDec 6, 2024

Protocol should be documented as usingABCMeta as metaclass#127677

Open

Protocols inwsgiref.types should use@abstractmethod#127678

Open

srittau added4 commits

December 6, 2024 12:01

Add .. versionadded

56a38a0

Added slashes to documented signatures

f2c331b

Fix overindentation

6764b6a

Fix documentation of Reader.__iter__()

022acaa

Copy link

Member

AlexWaygood commentedDec 6, 2024

I would prefer these protocols to be inio, where they fit better thematically, but that would require importingtyping with unforseeable (performance) implications for such a foundational module.

io feels like a more natural home to me too. Could you not do something similar to the pseudo-protocol ABCs incollections.abc,contextlib andos.path? The approach there is:

Make the classes ABCs rather than protocols
But document them as protocols
And write__subclasshook__ methods so that they behave identically to runtime-checkable Protocols in theirisinstance()/issubclass() behaviour:
cpython/Lib/contextlib.py
Lines 34 to 38 in023b7d2
@classmethod
def__subclasshook__(cls,C):
ifclsisAbstractContextManager:
return_collections_abc._check_methods(C,"__enter__","__exit__")
returnNotImplemented

And special-case the ABCs intyping.py so that thetyping module treats them identically to protocols:

cpython/Lib/typing.py

Lines 1940 to 1948 in023b7d2

	_PROTO_ALLOWLIST= {
	'collections.abc': [
	'Callable','Awaitable','Iterable','Iterator','AsyncIterable',
	'AsyncIterator','Hashable','Sized','Container','Collection',
	'Reversible','Buffer',
	],
	'contextlib': ['AbstractContextManager','AbstractAsyncContextManager'],
	'os': ['PathLike'],
	}

And pretend in typeshed that they're protocols, so that type checkers treat them as protocols (I feel likepython/typeshed@f554f54 missed the point there slightly)

Copy link

ContributorAuthor

srittau commentedDec 6, 2024

That feels terribly hackish to me, even if there is precedence. And in my experience, hackish stuff often falls on one's feet one way or the other. But I can move it toio if that's the consensus.

Copy link

Member

AlexWaygood commentedDec 6, 2024

This has been the way thecollections.abc andcontextlib ABCs have always worked. As far as I'm aware, it's worked pretty well up to now. And it doesn't actually feel like the most hackish part ofcollections.abc ;)

cpython/Lib/collections/__init__.py

Lines 32 to 33 in023b7d2

	_sys.modules['collections.abc']=_collections_abc
	abc=_collections_abc

Copy link

ContributorAuthor

srittau commentedDec 6, 2024

Well, considering thattyping.is_protocol uses the_is_protocol marker to determine whether something is a protocol, I guess that's the official way to mark protocols at runtime, not the fact that they are derived fromProtocol. Whichcontradicts the typing spec at the moment. This all seems a bit precarious to me.

Copy link

Member

AlexWaygood commentedDec 6, 2024

I'm open to something less hacky if you can find a generalized solution that doesn't make_collections_abc depend ontyping (which would significantly slow down startup for all Python applications, since_collections_abc is imported as part of Python's initialisation).

Copy link

Member

AlexWaygood commentedDec 6, 2024

considering thattyping.is_protocol uses the_is_protocol marker to determine whether something is a protocol, I guess that's the official way to mark protocols at runtime, not the fact that they are derived fromProtocol. Whichcontradicts the typing spec at the moment.

I don't think the typing spec needs to be cognisant of the hacks we use to make things work at runtime in the stdlib! This knowledge is irrelevant for third-party users of protocols, to users of type checkers and to implementers of type checkers. It's definitely not encouraged for any external users to make use of the_is_protocol attribute.

Copy link

ContributorAuthor

srittau commentedDec 6, 2024

CI seems flaky, failures are unrelated.

Remove the @runtime_checkable flags

b86073d

Fix doc reference to decorator

ca72c19

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

AA-Turner reviewed

Feb 27, 2025

View reviewed changes

Doc/library/io.rst OutdatedShow resolvedHide resolved

AA-Turner reviewed

Feb 27, 2025

View reviewed changes

Doc/library/typing.rstShow resolvedHide resolved

AA-Turner reviewed

Feb 27, 2025

View reviewed changes

Doc/library/typing.rst OutdatedShow resolvedHide resolved

srittauand others added2 commits

February 27, 2025 13:08

Fix references in docs

3b5975e

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

Split signature

96080fe

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

AlexWaygood reviewed

Feb 27, 2025

View reviewed changes

Copy link

Member

AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This LGTM now but it might be good to have one or two simple tests covering the new code. It probably doesn't need to be much more than this (added totest_io.py)

cpython/Lib/test/test_typing.py

Lines 4260 to 4262 ina083633

	deftest_supports_index(self):
	self.assertIsSubclass(int,typing.SupportsIndex)
	self.assertNotIsSubclass(str,typing.SupportsIndex)

and maybe an addition to thetest_typing tests similar to this one:

cpython/Lib/test/test_typing.py

Lines 4274 to 4303 ina083633

	deftest_collections_protocols_allowed(self):
	@runtime_checkable
	classCustom(collections.abc.Iterable,Protocol):
	defclose(self): ...

	classA:pass
	classB:
	def__iter__(self):
	return []
	defclose(self):
	return0

	self.assertIsSubclass(B,Custom)
	self.assertNotIsSubclass(A,Custom)

	@runtime_checkable
	classReleasableBuffer(collections.abc.Buffer,Protocol):
	def__release_buffer__(self,mv:memoryview)->None: ...

	classC:pass
	classD:
	def__buffer__(self,flags:int)->memoryview:
	returnmemoryview(b'')
	def__release_buffer__(self,mv:memoryview)->None:
	pass

	self.assertIsSubclass(D,ReleasableBuffer)
	self.assertIsInstance(D(),ReleasableBuffer)
	self.assertNotIsSubclass(C,ReleasableBuffer)
	self.assertNotIsInstance(C(),ReleasableBuffer)

srittau added3 commits

February 27, 2025 13:38

Document that Reader and Writer are generic

3723370

Add tests

76003a8

Add missing import

43e23f0

AlexWaygood approved these changes

Feb 27, 2025

View reviewed changes

Copy link

Member

AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is great, thank you! Just a few docs nits remaining

Doc/library/io.rst OutdatedShow resolvedHide resolved

Doc/library/typing.rst OutdatedShow resolvedHide resolved

Lib/io.py OutdatedShow resolvedHide resolved

bedevere-appbot added awaiting merge and removed awaiting review labels

Feb 27, 2025

Doc fixes

c644770

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

Copy link

Member

AlexWaygood commentedFeb 27, 2025•
edited
Loading

I'll leave it a little bit in case any of the other reviewers here have any remaining concerns before merging

AA-Turner reviewed

Feb 27, 2025

View reviewed changes

Lib/io.py

		__slots__ = ()

		@abc.abstractmethod
		def read(self, size=..., /):

Copy link

Member

AA-TurnerFeb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sorry if this has been discussed before, but I'm unsure on the runtime use ofsize=... (I didn't notice this earlier in my documentation review, sorry).

Almost every otherread(size) method I can find has a default of eitherNone or-1. I also can't find another method in the stdlib with a default of... (outside of the recently-added protocols inwsgiref.types).

Would it be better to havesize=-1, to indicate that the method takes anint? I'm not sure how much we want typeshed-like practices to leak into the standard library.

Suggested change

	defread(self,size=...,/):
	defread(self,size=-1,/):

Copy link

ContributorAuthor

srittauFeb 27, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Mandating defaults is not really something you can do in a protocol. I also wouldn't want to mandate that implementors have to use a default of-1, because – as you said – some implementations useNone.

Copy link

Member

AA-TurnerFeb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Right, but this isn't a protocol -- it's an ABC, which do have defaults -- see e.g.collections.abc.Generator. All of theread() methods inio are documented as havingread(size=-1, /), and given these ABCs are going intoio, I think we should be consistent with that interface, or have good reason to diverge from it (& document why).

Copy link

ContributorAuthor

srittauFeb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It's supposed to be a protocol, not an ABC. (Notwithstanding the fact that all protocols are ABCs.) It's just a protocol in the implementation for performance reasons. And using-1 as a default would give users the very wrong impression that they can useread(-1) when that may or may not actually be supported.

Copy link

Member

AA-TurnerFeb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

using-1 as a default would give users the very wrong impression that they can useread(-1) when that may or may not actually be supported.

From the documentation ofio.RawIOBase.read():

Read up tosize bytes from the object and return them. As a convenience, ifsize is unspecified or -1, all bytes until EOF are returned.

I would expect theio.Reader.read() ABC/protocol to have this same guarantee, for a 'properly' implementedread() method (according to theio expecations). In the proposed documentation, we say:

Read data from the input stream and return it. Ifsize is specified, it should be an integer, and at most size items (bytes/characters) will be read.

This forbidsNone (good!), but is silent on what happens shouldsize be omitted. I still think we should use-1 instead of..., but at the very least we should include in the documentation the contract for what happens whenread() is called with no arguments.

Copy link

ContributorAuthor

srittauFeb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I disagree.-1 is the default forsome implementations, but the protocol should not make any default mandatory.

Copy link

Member

AA-TurnerFeb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm still not sure I follow, though -- currently the ABC/protocol has the mandatory default ofsize=..., which is always invalid and not the correct type.-1 is valid for all interfaces specified by theio documentation, which is where this new type is being added.

I ran the following withmypy --strict and it passed, so I don't think type checkers care about default values (and as discussed, the type forbids using non-integer types):

importabcclassReader(metaclass=abc.ABCMeta):__slots__= ()@abc.abstractmethoddefread(self,size:int=-1,/)->bytes:passclassCustomReader(Reader):defread(self,size:int=-1,/)->bytes:returnb''classCustomReaderZero(Reader):defread(self,size:int=0,/)->bytes:returnb''assertissubclass(CustomReader,Reader)assertissubclass(CustomReaderZero,Reader)assertisinstance(CustomReader(),Reader)assertisinstance(CustomReaderZero(),Reader)defreader_func(r:Reader)->None:r.read()reader_func(CustomReader())reader_func(CustomReaderZero())

Copy link

Member

JelleZijlstraMar 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I agree with Sebastian here; we should use... because the protocol need not mandate any particular default.

-1 is valid for all interfaces specified by the io documentation

The protocol should also match other file-like classes defined elsewhere in the stdlib or even in third-party libraries. When defining a protocol it's often useful to be permissive, so that all objects that are intended to match the protocol actually match it.

JelleZijlstra approved these changes

Mar 1, 2025

View reviewed changes

Merge branch 'main' into typing-readable-writable

bfab2fd

Copy link

ContributorAuthor

srittau commentedMar 6, 2025

Could this be merged before the next alpha next week?

JelleZijlstra merged commitc6dd234 intopython:main

Mar 6, 2025

39 checks passed

bedevere-appbot removed the awaiting merge label

Mar 6, 2025

srittau deleted the typing-readable-writable branch

March 6, 2025 16:02

Copy link

bluetech commentedMar 7, 2025

I only saw the update now, great that it's merged!

I have a few comments. I hope they're not too bothersome - I just think these protocols are important so it's worth the effort.

Can the docs include type annotations? I know that Sphinx supports it. Given that this is meant for static typing, it makes sense to me to include.

I think it is important to document the expected semantics of these methods, otherwise you don't really know what you're going to get when you accept aReader orWriter. What I can think of:

What should happen on EOF?
Is EOF sticky?
Short reads/writes
What does it mean to omitsize?
What happens onsize 0?
Issize allowed to be -1 or more generally negative?
Canread return value beNone? Empty?
Canwrite return value be 0?
Expected exceptions --IOError?

I don't like the parameter namesize, because from the caller perspective it is not exactly a size of anything. I would have preferred justn. But, I see this name is already used e.g.IOBase and consistency probably wins the argument here.

The docs say "If size is specified, it should be an integer, and at most size items (bytes/characters) will be read" and "Write data to the output stream and return the number of items (bytes/characters) written". I would drop the "(bytes/characters)" sinceT is not necessarily that.

The docs say "The following protocols can be used for annotating function and method arguments for simple stream reading or writing operations". I have two nitpicks:

What is annotated is the parameter, not the argument (= a value for a parameter in a specific invocation). So would change "arguments" -> "parameters".
I don't think it is just for simple operations, so I'd remove "simple".

I thinkT should have some bound. I think at leastSized, it's hard to interpret the return value without a notion of size.

Copy link

ContributorAuthor

srittau commentedMar 9, 2025

A merged PR is probably not the best place to discuss this, but a few points:

Can the docs include type annotations?

Personally, I'd be fine with, and actually prefer if the docs used type annotations more, but I didn't want to divert from the existing style in the documentation.

I think it is important to document the expected semantics of these methods

While it would make sense to me to document some of those semantics (e.g. size must be>= 1), most are not part of this protocol. Exceptions and EOF are not concepts that are understood by those protocols.

I would drop the "(bytes/characters)" since T is not necessarily that.

It might have been clearer to say "(e.g. bytes or characters)", but I think calling out those specific types here makes the documentation clearer, as in 99% of cases it will be either of those. Otherwise the concept of "item" is a bit nebulous.

I think T should have some bound.

I don't think it's necessary to make the protocol more complex. Whoever is using the protocol will have to choose an type that they can work with.

Copy link

ContributorAuthor

srittau commentedMar 9, 2025

Also in general, protocols are not a replacement for API documentation. It doesn't make too much sense to include expectations around a protocol that can't be checked statically. To document these expectations is the responsibility of the author of a function that uses these protocols, and to follow them is the responsibility of the user. These expectations can vary significantly from function to function.

Copy link

bluetech commentedMar 10, 2025

(Please feel free to ignore this if you don't want to discuss this further -- I couldn't resist posting another comment...)

If I understand correctly, you prefer a narrow approach to these protocols because 1) they can't be checked statically 2) semantics can vary between uses. My replies:

In an ideal world, it would be possible to convey all semantics in a static way. But if it's not possible, documentation is the alternative. Stated differently, the Platonic ideal of theReader interface contains all of its (agreed) semantics, but its projection onto the Python type system is incomplete, so the documentation should try its best to fill the gap. That's how I view it at least.
Why should the semantics vary between uses? A single set of semantics has a big advantage - it reduces (N impls * M uses) coordination points to N + M. As an implementer, I can (implicitly or explicitly) follow the protocol and don't need to invent or document my semantics. As a user, I write to a single set of semantics, not needing to handle multiple sets, defensively protect against unknown ones, or document my expectations. As a "middleware" (e.g. aBufferedReader which adds buffering to any reader), I can take any generic reader. What's the advantage of allowing implementations to be inconsistent on something like how EOF is signaled?