Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-134706: Return bytes written from codecs.Stream(Reader)Writer.write()#134708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
srittau wants to merge5 commits intopython:main
base:main
Choose a base branch
Loading
fromsrittau:codecs-write

Conversation

srittau
Copy link
Contributor

@srittausrittau commentedMay 26, 2025
edited by github-actionsbot
Loading

Copy link
Member

@sobolevnsobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I consider this a bug fix and propose to backport this change.

Copy link
Member

@picnixzpicnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Just a small suggestion for the NEWS

@picnixz
Copy link
Member

Actually, there is alsohttps://docs.python.org/3/library/asyncio-stream.html#asyncio.StreamWriter which doesn't return the number of bytes as well.

@sobolevnsobolevn added needs backport to 3.13bugs and security fixes needs backport to 3.14bugs and security fixes labelsMay 26, 2025
@srittau
Copy link
ContributorAuthor

Actually, there is alsohttps://docs.python.org/3/library/asyncio-stream.html#asyncio.StreamWriter which doesn't return the number of bytes as well.

I think this is a different case, as that seems to write asynchronously, so the number of bytes written is not known at the timewrite() returns.

picnixz reacted with thumbs up emoji

Copy link
Contributor

@cmaloneycmaloney left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks good to me.

--
There are a number of cases in test mocks which do similarly I have in my backlog to make small patches for (Have a patch locally which makes a warning ifTextIOWrapper gets something other than a number of bytes which finds a lot).

@sobolevn
Copy link
Member

btw, can we add / modify a test for this?

@srittau
Copy link
ContributorAuthor

I was originally looking into this, but noticed that there were no tests for codecs at all. I can start a test suite, of course.

@sobolevn
Copy link
Member

I am not a fan of adding big amount of test code in the same trivial PR. So, please, let's create a new one with codecs tests 🙏

@picnixz
Copy link
Member

picnixz commentedMay 27, 2025
edited
Loading

The codecs tests are in test_capi IIRC and/or in another place that I forgot

@srittau
Copy link
ContributorAuthor

srittau commentedMay 27, 2025
edited
Loading

There are actually two tests: One for the capi and one for the Python wrapper. They are in the obvious place, so I have no idea, how I could overlook them ... Will add tests.

I am not a fan of adding big amount of test code in the same trivial PR. So, please, let's create a new one with codecs tests 🙏

Well, I was only going to add a test for this specific feature anyway, not a whole suite of tests.

@srittau
Copy link
ContributorAuthor

Could this be merged so the backport gets into the next 3.13 release which is scheduled for 2025-06-03?

@AA-Turner
Copy link
Member

cc@malemburg as codecs expert and original author of this code.

Copy link
Contributor

@cmaloneycmaloney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Should this change.writelines as well?

@srittau
Copy link
ContributorAuthor

srittau commentedMay 30, 2025
edited
Loading

Should this change.writelines as well?

Looking at thedocumentation for IOBase.writelines and the existing stub annotations, it seems unusual forwritelines() to return anything butNone. In typeshed, allwritelines methods are annotated to return eitherNone orobject. (The latter probably to allow sub-classes to return something, although none seem to want to do that.) So I would recommend not to change it forcodecs.

cmaloney reacted with thumbs up emoji

@serhiy-storchaka
Copy link
Member

It returns the number of bytes written, butio.TextIOWrapper.write() returns the number of characters written.

cmaloney reacted with thumbs up emoji

@srittau
Copy link
ContributorAuthor

I don't want to be a pest, but 3.13.4 is expected tomorrow, and I would like to update typeshed after that release while this is still fresh in my mind.

@AA-Turner
Copy link
Member

I'd prefer to wait for a review from MAL, better to delay than rush a merge.

@malemburg
Copy link
Member

I'm not sure whether it's a good idea to return a value from.write(). The reason is that not theStreamWriter defines what kind of count is returned, but instead the.write() method of the underlying stream. Most of the time, this will be a bytes count, but not always.

Theio module also is not consistent with this: most objects return number of bytes, butTextIOBase uses number of characters (or rather: code points) and always returnslen(text) - not really useful information.

Overall, I'm not convinced that returning a value provides much benefit. You can use.tell() to get a defined stream position, if needed (and available).

In any case: This is a new feature, so I don't think it should go into a patch level release.

@srittau
Copy link
ContributorAuthor

Well, the current behavior has the downside that it may break code in unexpected ways, as it breaks thewrite() contract. This is especially true when aStreamWriter is wrapped by one of the many I/O wrappers. This was caught in typeshed, so this is a real-world problem.

@picnixzpicnixz removed needs backport to 3.13bugs and security fixes needs backport to 3.14bugs and security fixes labelsJun 2, 2025
@picnixz
Copy link
Member

picnixz commentedJun 2, 2025
edited
Loading

For now, I'll remove the backport labels; we'll add them back if needed. The fact that the contract is broken is a convincing argument though.

srittau reacted with thumbs up emoji

@malemburg
Copy link
Member

Well, the current behavior has the downside that it may break code in unexpected ways, as it breaks thewrite() contract. This is especially true when aStreamWriter is wrapped by one of the many I/O wrappers. This was caught in typeshed, so this is a real-world problem.

I'm not sure where you get this idea from that there is some.write() contract. As I tried to explain, different I/O implementations return different things for.write(). Some returnNone, others number of bytes written and yet others number of code points.

We could start to use the same approach as is done for theStreamReaderWriter, in that we simply return whatever the underlying stream returns for.write(), but this should then be documented as such and also made consistent with.writelines().

@srittau
Copy link
ContributorAuthor

I'm not sure where you get this idea from that there is some.write() contract. As I tried to explain, different I/O implementations return different things for.write(). Some returnNone, others number of bytes written and yet others number of code points.

The contract to return the number of items written (characters or bytes) for synchronous I/O is found throughout the standard library, both in docs in implementation. For example, all types returned byopen() return the number of items written, as does the (virtual) base classtyping.IO, which tries to be a base for "file-like" types. There are very few exceptions to file-like classes that don't return the number of items written. Apart from codecs classes, I could only findsqlite3.Blob.write() andwsgiref.validate.ErrorWrapper.write(). These should probably be fixed as well.

We could start to use the same approach as is done for theStreamReaderWriter, in that we simply return whatever the underlying stream returns for.write(), but this should then be documented as such and also made consistent with.writelines().

This is exactly what this PR does, although the documentation was removed when it was determined that this should be a bug fix.writelines() never returns anything butNone in the stdlib. Changing that would be a bigger effort.

@serhiy-storchaka
Copy link
Member

If returning the number of characters written is the part of the contract, then returning the number of bytes written breaks the contract.

Thecodecs streams were implemented before adding theio module. StreamWriter is not a part of theio class hierarchy.

Of course, been more compatible with TextIOWriter would be good. But we should not introduce a new subtle discrepancy.

@serhiy-storchaka
Copy link
Member

BTW, returning the number of bytes written is essential for unbuffered bytes stream, but I do not see a use case for text streams.

@srittau
Copy link
ContributorAuthor

If returning the number of characters written is the part of the contract, then returning the number of bytes written breaks the contract.

That's actually a good point. The writers behave like a "character writer", not a "bytes writer" to the user. The fact that they write bytes behind the scenes does not matter to the user. So instead of returning the number of bytes written, we should actually return the number of characters written (consistently, without looking at the actual codec). This would also be consistent with other text writers:

>>>withopen("foo","w")asf:...print(f.write("Föo"))...3

@cmaloney
Copy link
Contributor

For character writers and non-blocking what number of characters should be returned if only part of a multi-byte character is written? (Non-blocking I/O is used with various pieces and usually doesn't guarantee whole "characters" are written. Going from None -> int returned is a safeish API change, int -> slightly different meaning int a lot less safe).

@srittau
Copy link
ContributorAuthor

For character writers and non-blocking what number of characters should be returned if only part of a multi-byte character is written?

Non-blocking I/O generally returnsNone forwrite(), but that is a special case that's not supported by the stream writers as far as I can tell.

@cmaloney
Copy link
Contributor

cmaloney commentedJun 2, 2025
edited
Loading

None when no data is writtenand writing would block, they return number of bytes written / can do "partial" writes generally. Some (ex. BufferedIO) retry, but not all (ex. FileIO / "Raw IO" and TextIO). As far as I know there's no reliable way to know "buffers writes" and/or "will retry partial writes" for a "file-like" at the moment... For "Raw IO" it is well specified and tested what happens, and BufferedIO has a consistent write behavior (write raises BlockingIOError with acharacters_written member set to the number of bytes written, read returns number of bytes). "Text IO" behavior wasn't specified in the initial I/O spec (PEP-3116"Non-Blocking I/O"). Text I/O can be used directly with Raw I/O sometimes (and is), which has multiple open long-standing bugs (gh-57531 is probably the most up to date discussion one)

Also note: the implementation and docs differ on some cases although I'm actively working on getting the docs and implementation to match.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@cmaloneycmaloneycmaloney approved these changes

@sobolevnsobolevnsobolevn approved these changes

@picnixzpicnixzpicnixz approved these changes

@malemburgmalemburgAwaiting requested review from malemburg

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

7 participants
@srittau@picnixz@sobolevn@AA-Turner@serhiy-storchaka@malemburg@cmaloney

[8]ページ先頭

©2009-2025 Movatter.jp