
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2016-03-07 10:10 bymaubp, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| issue26499.diff | SilentGhost,2016-03-07 17:49 | review | ||
| issue26499_2.diff | SilentGhost,2016-03-09 08:30 | review | ||
| issue26499_3.diff | SilentGhost,2016-03-09 19:58 | review | ||
| issue26499_4.diff | SilentGhost,2016-03-12 15:36 | review | ||
| issue26499_5.diff | SilentGhost,2016-03-13 11:38 | review | ||
| issue26499_6.diff | martin.panter,2016-03-15 05:59 | review | ||
| Messages (12) | |||
|---|---|---|---|
| msg261287 -(view) | Author: Peter (maubp) | Date: 2016-03-07 10:10 | |
This is a regression in Python 3.5 tested under Linux and Mac OS X, spotted from a failing test in Biopythonhttps://github.com/biopython/biopython/issues/773 where we would parse a file from the internet. The trigger is partially reading the network handle line by line (e.g. until an end record marker is found), and then calling handle.read() to fetch any remaining data. Self contained examples below.Note that partially reading a file like this still works:$ python3.5Python 3.5.0 (default, Sep 14 2015, 12:13:24) [GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> >>> from urllib.request import urlopen>>> handle = urlopen("http://www.python.org")>>> chunk = handle.read(50)>>> rest = handle.read()>>> handle.close()However, the following variants read a few lines and then attempt to call handle.read() and fail. The URL is not important (as long as it has over four lines in these examples).Using readline,>>> from urllib.request import urlopen>>> handle = urlopen("http://www.python.org")>>> for i in range(4):... line = handle.readline()... >>> rest = handle.read()Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/xxx/lib/python3.5/http/client.py", line 446, in read s = self._safe_read(self.length) File "/Users/xxx/lib/python3.5/http/client.py", line 594, in _safe_read raise IncompleteRead(b''.join(s), amt)http.client.IncompleteRead: IncompleteRead(46698 bytes read, 259 more expected)Using line iteration via next,>>> from urllib.request import urlopen>>> handle = urlopen("http://www.python.org")>>> for i in range(4):... line = next(handle)... >>> rest = handle.read()Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/xxx/lib/python3.5/http/client.py", line 446, in read s = self._safe_read(self.length) File "/Users/xxx/lib/python3.5/http/client.py", line 594, in _safe_read raise IncompleteRead(b''.join(s), amt)http.client.IncompleteRead: IncompleteRead(46698 bytes read, 259 more expected)Using line iteration directly,>>> from urllib.request import urlopen>>> count = 0>>> handle = urlopen("http://www.python.org")>>> for line in handle:... count += 1... if count == 4:... break... >>> rest = handle.read()Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/xxx/lib/python3.5/http/client.py", line 446, in read s = self._safe_read(self.length) File "/Users/xxx/lib/python3.5/http/client.py", line 594, in _safe_read raise IncompleteRead(b''.join(s), amt)http.client.IncompleteRead: IncompleteRead(46698 bytes read, 259 more expected)These examples all worked on Python 3.3 and 3.4 so this is a regression. | |||
| msg261309 -(view) | Author: SilentGhost (SilentGhost)*![]() | Date: 2016-03-07 17:49 | |
As far as I'm able to track it, it was a refactoring inissue 19009 that is responsible for this regression (rev49017c391564). I'm adding Kristján, so that he'd have a look at the attached fix and test. | |||
| msg261401 -(view) | Author: Martin Panter (martin.panter)*![]() | Date: 2016-03-09 07:29 | |
Thanks for the report and the patch. It looks okay as far as it goes, but I think there are other related bugs:## read1() doesn’t update length either ##>>> handle = urlopen("https://www.python.org/")>>> len(handle.read1())7374>>> handle.read()Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/proj/python/cpython/Lib/http/client.py", line 462, in read s = self._safe_read(self.length) File "/home/proj/python/cpython/Lib/http/client.py", line 614, in _safe_read raise IncompleteRead(b''.join(s), amt)http.client.IncompleteRead: IncompleteRead(39583 bytes read, 7374 more expected)## readline() and read1() blindly read beyond Content-Length ##>>> conn = HTTPSConnection("www.python.org")>>> conn.request("GET", "/")>>> resp = conn.getresponse()>>> resp.readlines() # Hangs until the connection is closedI wonder if anyone has considered implementing HTTPResponse by wrapping or subclassing BufferedReader; that way you get well-tested readline() etc functionality for free. The HTTP protocol (Connection: close, Content-Length, chunked decoding) could be handled by an internal RawIOBase.readinto() method. | |||
| msg261405 -(view) | Author: SilentGhost (SilentGhost)*![]() | Date: 2016-03-09 08:30 | |
Here is the updated patch. I only included the additional fix for read1 since readlines is not overwritten in the HTTPConnection. Not sure how to write test for it, does it need a much longer body (compared to the one in tests) to produce this behaviour?The larger refactoring might be appropriate for 3.6, but I believe these smaller fixed could go into the next micro release. | |||
| msg261409 -(view) | Author: Martin Panter (martin.panter)*![]() | Date: 2016-03-09 09:21 | |
Yes I agree it would be best to fix the bug(s) without major refactoring if practical.To fix the other bug, I think the parameters in read1(n) and readline(limit) need to be capped at self.length.The inherited BufferedIOBase.readlines() implementation should just call readline() multiple times. The point is that if you call readline() when self.length is reduced to zero, it should behave like EOF, rather than waiting for more data from the socket. To test this, you might be able to use the FakeSocket class. You might have to add a second HTTP response to the data, and test that readline() etc doesn’t read any of the second response. | |||
| msg261470 -(view) | Author: SilentGhost (SilentGhost)*![]() | Date: 2016-03-09 19:58 | |
All the highlighted issue are now fixed. The limit on n in read1 wasn't tested.Your suggestion regarding testing went a bit over my head, Martin. So, just trying to make sure we're on the same page. ExtendedReadTest, where I thought placing these new tests, is already employing FakeSocket, but I'm not sure how one would add a second response and to what. In any case, some of the code in that class seems rather specific, so it's not clear if it could or should be reused. | |||
| msg261622 -(view) | Author: Martin Panter (martin.panter)*![]() | Date: 2016-03-12 00:27 | |
To add a second response you would just concatenate it to the existing response. (Your existing test already uses FakeSocket.) The FakeSocket parameter just represents data that would be sent from the server. So:body = ( b"HTTP/1.1 200 OK\r\n" b"Content-Length: 3\r\n" b"\r\n" b"abc" b"HTTP/1.1 408 Next response should not be read yet\r\n")In fact, see the BasicTest.test_content_length_sync() case, which literally has "extradata" as that last line. I think we just need to adapt or duplicate this test to cover readline() and read1(), not just read(). Maybe also with read1(100), readline(100) cases as well. | |||
| msg261656 -(view) | Author: SilentGhost (SilentGhost)*![]() | Date: 2016-03-12 15:36 | |
OK, here is the patch including the tests that seem to exercise the behaviour. | |||
| msg261673 -(view) | Author: Martin Panter (martin.panter)*![]() | Date: 2016-03-13 06:08 | |
Thanks Silent Ghost, this patch looks pretty good. I did leave a couple more comments.Also I just realized that HTTPResponse.readline() and read1() are not really documented. The BufferedIOBase support was meant to be added in 3.5, although readline() was supposed to be supported earlier for urlopen() compatibility. | |||
| msg261695 -(view) | Author: SilentGhost (SilentGhost)*![]() | Date: 2016-03-13 11:38 | |
Updated patch addresses the rietveld comments. | |||
| msg261801 -(view) | Author: Martin Panter (martin.panter)*![]() | Date: 2016-03-15 05:59 | |
Here is a modified version of the tests that I propose to commit soon. I trimmed the size of the tests down and simplified some of them. I also mentioned that BufferedIOBase is supported in the documentation. | |||
| msg261842 -(view) | Author: Roundup Robot (python-dev)![]() | Date: 2016-03-16 07:20 | |
New changesete95e9701b472 by Martin Panter in branch '3.5':Issue#26499: Fixes to HTTPResponse.readline() and read1(), by Silent Ghosthttps://hg.python.org/cpython/rev/e95e9701b472New changeset74b3523d6256 by Martin Panter in branch 'default':Issue#26499: Merge HTTPResponse fix from 3.5https://hg.python.org/cpython/rev/74b3523d6256 | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:28 | admin | set | github: 70686 |
| 2016-03-16 08:39:57 | martin.panter | set | status: open -> closed resolution: fixed stage: commit review -> resolved |
| 2016-03-16 07:20:24 | python-dev | set | nosy: +python-dev messages: +msg261842 |
| 2016-03-15 05:59:49 | martin.panter | set | files: +issue26499_6.diff messages: +msg261801 stage: patch review -> commit review |
| 2016-03-13 11:38:45 | SilentGhost | set | files: +issue26499_5.diff messages: +msg261695 |
| 2016-03-13 06:08:19 | martin.panter | set | messages: +msg261673 |
| 2016-03-12 15:36:22 | SilentGhost | set | files: +issue26499_4.diff messages: +msg261656 |
| 2016-03-12 00:27:18 | martin.panter | set | messages: +msg261622 |
| 2016-03-09 19:58:22 | SilentGhost | set | files: +issue26499_3.diff messages: +msg261470 |
| 2016-03-09 09:21:58 | martin.panter | set | messages: +msg261409 |
| 2016-03-09 08:30:19 | SilentGhost | set | files: +issue26499_2.diff messages: +msg261405 |
| 2016-03-09 07:29:47 | martin.panter | set | nosy: +martin.panter messages: +msg261401 |
| 2016-03-07 17:49:15 | SilentGhost | set | files: +issue26499.diff versions: + Python 3.6 keywords: +3.5regression,patch nosy: +kristjan.jonsson,SilentGhost,serhiy.storchaka messages: +msg261309 stage: patch review |
| 2016-03-07 10:10:53 | maubp | create | |