Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue32110

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:Make codecs.StreamReader.read() more compatible with read() of other files
Type:behaviorStage:resolved
Components:IO, Library (Lib)Versions:Python 3.7, Python 3.6, Python 2.7
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To: serhiy.storchakaNosy List: lemburg, serhiy.storchaka
Priority:normalKeywords:patch

Created on2017-11-22 07:40 byserhiy.storchaka, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.

Pull Requests
URLStatusLinkedEdit
PR 4499mergedserhiy.storchaka,2017-11-22 07:47
PR 4622mergedpython-dev,2017-11-28 23:30
PR 4623mergedpython-dev,2017-11-28 23:31
Messages (6)
msg306701 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2017-11-22 07:40
Usually the read() method of a file-like object takes one optional argument which limits the amount of data (the number of bytes or characters) returned if specified.codecs.StreamReader.read() also has such parameter. But this is the second parameter. The first parameter limits the number of bytes read for decoding. read(1) can return 70 characters, that will confuse most callers which expect either a single character or an empty string (at the end of stream).Some times ago codecs.open() was recommended as a replacement for the builtin open() in programs that should work in 2.x and 3.x (this was before adding io.open()), and it is still used in many programs. But this peculiarity makes it bad replacement of builtin open().I wanted to fix this issue long time ago, but forgot, and the question on Stack Overflow has reminded me about this.https://stackoverflow.com/questions/46437761/codecs-openutf-8-fails-to-read-plain-ascii-file
msg306705 -(view)Author: Marc-Andre Lemburg (lemburg)*(Python committer)Date: 2017-11-22 09:20
On 22.11.2017 08:40, Serhiy Storchaka wrote:> Usually the read() method of a file-like object takes one optional argument which limits the amount of data (the number of bytes or characters) returned if specified.> > codecs.StreamReader.read() also has such parameter. But this is the second parameter. The first parameter limits the number of bytes read for decoding. read(1) can return 70 characters, that will confuse most callers which expect either a single character or an empty string (at the end of stream).That's not true. .read(1) will at most read 1 byte from the streamand decode it. There's no way it will return 70 characters. It willusually return less chars than the number of bytes read.The reasoning here is the same as for .read() on regular bytestreams in Python 2.x: the first argument size tells the reader howmany bytes to read for decoding, since this is needed to properlywork together with .seek().The optional second parameter chars was added as convenience,since the user may not know how many bytes need to be read inorder to decode a certain number of characters.That said, I see in your patch that you want to bind charsto size. That will work and also protect the user from theunlikely case where the codec returns more chars than bytesread.
msg306710 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2017-11-22 09:56
> That's not true. .read(1) will at most read 1 byte from the stream> and decode it. There's no way it will return 70 characters.See the added tests. They are failed without changing the read() method..read(1) currently returns all characters from the characters buffer. And this buffer can be not empty after .readline().I understand the reason of having two limitation parameters in StreamReader.read(). But currently its behavior does not completely match the expected behavior of the read() method with one argument.Actually size already has been used instead of chars if chars < 0 for reading in a loop. The code can be simplified.
msg307191 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2017-11-28 23:30
New changeset219c2de5ad0fdac825298bed1bb251f16956c04a by Serhiy Storchaka in branch 'master':bpo-32110: codecs.StreamReader.read(n) now returns not more than n (#4499)https://github.com/python/cpython/commit/219c2de5ad0fdac825298bed1bb251f16956c04a
msg307194 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2017-11-29 00:06
New changeset230ffeae0a3961b1769806bd722c26227c84e8da by Serhiy Storchaka (Miss Islington (bot)) in branch '3.6':bpo-32110: codecs.StreamReader.read(n) now returns not more than n (GH-4499) (#4622)https://github.com/python/cpython/commit/230ffeae0a3961b1769806bd722c26227c84e8da
msg307196 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2017-11-29 00:15
New changesetfc73c54dae46e6c47dcd4a535f7bc68a46b8e398 by Serhiy Storchaka (Miss Islington (bot)) in branch '2.7':bpo-32110: codecs.StreamReader.read(n) now returns not more than n (GH-4499) (#4623)https://github.com/python/cpython/commit/fc73c54dae46e6c47dcd4a535f7bc68a46b8e398
History
DateUserActionArgs
2022-04-11 14:58:54adminsetgithub: 76291
2017-11-29 00:16:20serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-11-29 00:15:45serhiy.storchakasetmessages: +msg307196
2017-11-29 00:06:55serhiy.storchakasetmessages: +msg307194
2017-11-28 23:31:10python-devsetpull_requests: +pull_request4538
2017-11-28 23:30:14python-devsetpull_requests: +pull_request4537
2017-11-28 23:30:02serhiy.storchakasetmessages: +msg307191
2017-11-22 09:56:57serhiy.storchakasetmessages: +msg306710
2017-11-22 09:20:56lemburgsetnosy: +lemburg
messages: +msg306705
2017-11-22 07:47:23serhiy.storchakasetkeywords: +patch
stage: patch review
pull_requests: +pull_request4437
2017-11-22 07:40:16serhiy.storchakacreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp