Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue8260

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:When I use codecs.open(...) and f.readline() follow up by f.read() return bad result
Type:behaviorStage:resolved
Components:Library (Lib)Versions:Python 3.3, Python 3.4, Python 2.7
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To: serhiy.storchakaNosy List: ajaksu2, amaury.forgeotdarc, eric.araujo, harobed, lemburg, ncoghlan, python-dev, r.david.murray, serhiy.storchaka, vstinner
Priority:normalKeywords:patch

Created on2010-03-29 15:09 byharobed, last changed2022-04-11 14:56 byadmin. This issue is nowclosed.

Files
File nameUploadedDescriptionEdit
codecs_read.patchamaury.forgeotdarc,2010-03-31 10:00
codecs_read-2.patchamaury.forgeotdarc,2010-03-31 11:11
codecs_read-3.patchserhiy.storchaka,2014-01-10 19:40review
Messages (15)
msg101892 -(view)Author: harobed (harobed)Date: 2010-03-29 15:09
This is an example, last assert return an error :f = open('data.txt', 'w')f.write("""line 1line 2line 3line 4line 5line 6line 7line 8line 9line 10line 11""")f.close()f = open('data.txt', 'r')assert f.readline() == 'line 1\n'assert f.read() == """line 2line 3line 4line 5line 6line 7line 8line 9line 10line 11"""f.close()import codecsf = codecs.open('data.txt', 'r', 'utf8')assert f.read() == """line 1line 2line 3line 4line 5line 6line 7line 8line 9line 10line 11"""f.close()f = codecs.open('data.txt', 'r', 'utf8')assert f.readline() == 'line 1\n'# this assert return a ERRORassert f.read() == """line 2line 3line 4line 5line 6line 7line 8line 9line 10line 11"""f.close()Regards,Stephane
msg101980 -(view)Author: Daniel Diniz (ajaksu2)*(Python triager)Date: 2010-03-31 06:12
Hi Stephane,I think you're seeing different buffering behavior, which I suspect is correct according to docs.codecs.open should default to line buffering[1], while open uses the system default[2].The read() where the assert fails is returning the remaining buffer from the readline (which read 72 chars).Asserting e.g. "f.read(1024) == ..." will give you the expected result.[1]http://docs.python.org/library/codecs.html#codecs.open[2]http://docs.python.org/library/functions.html#open
msg101987 -(view)Author: Amaury Forgeot d'Arc (amaury.forgeotdarc)*(Python committer)Date: 2010-03-31 10:00
Buffering applies when writing, not when reading a file.There is indeed a problem in codecs.py: after a readline(), read() will return the content of the internal buffer, and not more.The "size" parameter is a hint, and should not be used to decide whether the character buffer is enough to satisfy the read() request.Patch is attached, with test.
msg101988 -(view)Author: Marc-Andre Lemburg (lemburg)*(Python committer)Date: 2010-03-31 10:28
Amaury Forgeot d'Arc wrote:> > Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:> > Buffering applies when writing, not when reading a file.> > There is indeed a problem in codecs.py: after a readline(), read() will return the content of the internal buffer, and not more.> > The "size" parameter is a hint, and should not be used to decide whether the character buffer is enough to satisfy the read() request.> Patch is attached, with test.Agreed.The patch looks good except the if-line should read:if chars >= 0 and len(self.charbuffer) >= chars:  ...Thanks,-- Marc-Andre LemburgeGenix.com________________________________________________________________________::: Try our new mxODBC.Connect Python Database Interface for free ! ::::   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg           Registered at Amtsgericht Duesseldorf: HRB 46611http://www.egenix.com/company/contact/
msg101990 -(view)Author: Amaury Forgeot d'Arc (amaury.forgeotdarc)*(Python committer)Date: 2010-03-31 11:11
Updated patch.[I also tried to avoid reading the underlying file if len(self.bytebuffer)>=size, but it does not work with multibytes chars when size=1]
msg122823 -(view)Author: Éric Araujo (eric.araujo)*(Python committer)Date: 2010-11-29 16:17
I applied the diff to test_codecs in py3k, removed the u prefixes and ran: failure.  I applied the fix and the test passed.
msg138265 -(view)Author: harobed (harobed)Date: 2011-06-13 17:55
Up, I think this patch isn't applied in Python 3.3a0.
msg138273 -(view)Author: R. David Murray (r.david.murray)*(Python committer)Date: 2011-06-13 19:43
According to this ticket it hasn't been applied anywhere yet (a message will be posted here when it is).
msg139465 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2011-06-30 08:08
See also#12446.
msg177119 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2012-12-07 19:52
I think the patch is wrong or is not optimal for case when chars is -1, but size is not.If we want to read all data in any case, then we should call self.stream.read() without argument if chars < 0 or size < 0.If we want to read no more than size bytes, then all loop code should be totally rewritten.Perhaps I am wrong.
msg177123 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2012-12-07 20:04
As showed inissue12446,issue14475 andissue16636 there are different methods to reproduce this bug (read(size, chars) + readlines(), readline() + readlines()). All this cases should be tested.
msg207875 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2014-01-10 19:40
Here is revised patch.* Behavior is changed less. read() is less greedy and uses characters from the buffer when read() is called with only one argument (size). It is now a little closer to io stream's read() than with previous patch.* Added tests for cases ofissue12446 andissue16636.* Fixed read() for for the TransformCodecTest.test_read test added in 3.4. Actually the uu_codec and zlib_codec are broken.
msg209330 -(view)Author: Alyssa Coghlan (ncoghlan)*(Python committer)Date: 2014-01-26 15:40
Patch looks good to me, but if any specific features are needed to work around misbehaving codecs (as perissue 20132), a comment in the appropriate place referencing that issue would be helpful.And if that workaround means we can remove the special casing from the test_readlines test for the binary transform, cool :)
msg209335 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2014-01-26 16:23
Actually this patch doesn't work around misbehaving codecs. It just makes specific tests (one readline, one read) be passed. More complex tests which use multiple readline's or read's still can fail with these misbehaving codecs.
msg209337 -(view)Author: Roundup Robot (python-dev)(Python triager)Date: 2014-01-26 17:30
New changesete24265eb2271 by Serhiy Storchaka in branch '2.7':Issue#8260: The read(), readline() and readlines() methods ofhttp://hg.python.org/cpython/rev/e24265eb2271New changeset9c96c266896e by Serhiy Storchaka in branch '3.3':Issue#8260: The read(), readline() and readlines() methods ofhttp://hg.python.org/cpython/rev/9c96c266896eNew changesetb72508a785de by Serhiy Storchaka in branch 'default':Issue#8260: The read(), readline() and readlines() methods ofhttp://hg.python.org/cpython/rev/b72508a785de
History
DateUserActionArgs
2022-04-11 14:56:59adminsetgithub: 52507
2014-01-26 17:34:32serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2014-01-26 17:30:36python-devsetnosy: +python-dev
messages: +msg209337
2014-01-26 16:23:24serhiy.storchakasetmessages: +msg209335
2014-01-26 15:40:05ncoghlansetmessages: +msg209330
2014-01-26 10:23:08serhiy.storchakasetassignee:serhiy.storchaka
2014-01-21 20:34:58serhiy.storchakasetnosy: +ncoghlan
2014-01-10 19:40:38serhiy.storchakasetfiles: +codecs_read-3.patch

messages: +msg207875
versions: - Python 3.2
2012-12-07 20:04:06serhiy.storchakasetmessages: +msg177123
2012-12-07 20:03:53serhiy.storchakalinkissue16636 superseder
2012-12-07 20:03:38serhiy.storchakalinkissue14475 superseder
2012-12-07 20:03:22serhiy.storchakalinkissue12446 superseder
2012-12-07 19:52:34serhiy.storchakasetnosy: +serhiy.storchaka

messages: +msg177119
versions: + Python 3.4
2011-06-30 08:08:06vstinnersetmessages: +msg139465
2011-06-13 19:43:50r.david.murraysetnosy: +r.david.murray

messages: +msg138273
versions: + Python 3.3, - Python 3.1
2011-06-13 19:31:20vstinnersetnosy: +vstinner
2011-06-13 17:55:55harobedsetmessages: +msg138265
2010-11-29 16:17:29eric.araujosetnosy: +eric.araujo
title: When I use codecs.open(...) and f.readline() follow up byf.read() return bad result -> When I use codecs.open(...) and f.readline() follow up by f.read() return bad result
messages: +msg122823

versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
2010-03-31 11:11:07amaury.forgeotdarcsetfiles: +codecs_read-2.patch

messages: +msg101990
2010-03-31 10:28:20lemburgsetnosy: +lemburg
title: When I use codecs.open(...) and f.readline() follow up by f.read() return bad result -> When I use codecs.open(...) and f.readline() follow up byf.read() return bad result
messages: +msg101988
2010-03-31 10:00:19amaury.forgeotdarcsetfiles: +codecs_read.patch

nosy: +amaury.forgeotdarc
messages: +msg101987

keywords: +patch
stage: test needed -> patch review
2010-03-31 06:12:21ajaksu2setpriority: normal

nosy: +ajaksu2
messages: +msg101980

stage: test needed
2010-03-29 15:09:53harobedcreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp