
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2017-12-08 02:04 byyselivanov, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 4755 | merged | yselivanov,2017-12-08 02:25 | |
| PR 5408 | merged | yselivanov,2018-01-29 04:35 | |
| Messages (16) | |||
|---|---|---|---|
| msg307830 -(view) | Author: Yury Selivanov (yselivanov)*![]() | Date: 2017-12-08 02:10 | |
A couple emails from async-sig for the context:1.https://mail.python.org/pipermail/async-sig/2017-October/000392.html2.https://mail.python.org/pipermail/async-sig/2017-December/000423.htmlI propose to add another Protocol base class to asyncio: BufferedProtocol. It will have 'get_buffer()' and 'buffer_updated(nbytes)' methods instead of 'data_received()': class asyncio.BufferedProtocol: def get_buffer(self) -> memoryview: pass def buffer_updated(self, nbytes: int): passWhen the protocol's transport is ready to receive data, it will call `protocol.get_buffer()`. The latter must return an object that implements the buffer protocol. The transport will request a writable buffer over the returned object and receive data *into* that buffer.When the `sock.recv_into(buffer)` call is done, `protocol.buffer_updated(nbytes)` method will be called. The number of bytes received into the buffer will be passed as a first argument.I've implemented the proposed design in uvloop (branch 'get_buffer', [1]) and adjusted your benchmark [2] to use it. Here are benchmark results from my machine (macOS):vanilla asyncio: 120-135 Mb/suvloop: 320-330 Mb/suvloop/get_buffer: 600-650 Mb/s.[1] https://github.com/MagicStack/uvloop/tree/get_buffer[2] https://gist.github.com/1st1/1c606e5b83ef0e9c41faf21564d75ad7 | |||
| msg307832 -(view) | Author: Yury Selivanov (yselivanov)*![]() | Date: 2017-12-08 02:48 | |
I've made a PR that implements the change for selector_events.py.With the change:vanilla asyncio: 120-135 Mb/svanilla asyncio/get_buffer: 220-230 Mb/suvloop: 320-330 Mb/suvloop/get_buffer: 600-650 Mb/sIf we decide to go forward with the proposed design, I'll update the PR with support for proactor_events | |||
| msg307841 -(view) | Author: Andrew Svetlov (asvetlov)*![]() | Date: 2017-12-08 09:34 | |
Numbers are great!Couple questions.1. What happens if size of read data is greater than pre-allocated buffer?2. Is flow control logic changed or not? If I understand correctly pause_reading() / resume_reading() continue to work as earlier. | |||
| msg307846 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2017-12-08 10:51 | |
I have another question: what happens if there is a partial read?For example, let's says I return a 1024-bytes buffer in get_buffer(), but recv_into() receives data in 512 chunks. Is it:1. getbuffer() is called, returns 1024 bytes buffer2. recv_into() receives 512 bytes, writes them in buf[0:512]3. recv_into() receives another 512 bytes, writes them in buf[512:1024]or is it:1. getbuffer() is called, returns 1024 bytes buffer2. recv_into() receives 512 bytes, writes them in buf[0:512]3. getbuffer() is called, returns another 1024 bytes buffer4. recv_into() receives 512 bytes, writes them in newbuf[0:512] | |||
| msg307856 -(view) | Author: Yury Selivanov (yselivanov)*![]() | Date: 2017-12-08 17:26 | |
> 1. What happens if size of read data is greater than pre-allocated buffer?Let's say we have 2Kb of data in the socket's network buffer, and we only preallocated 0.5Kb in our buffer. We will the receive 0.5Kb in our buffer, 'Protocol.buffer_updated()' will be called with nbytes=512, and 1.5Kb of data will be left in the network buffer. So the loop will call get_buffer()/buffer_updated() again, and the cycle will continue until there's no data left.> 2. Is flow control logic changed or not? If I understand correctly pause_reading() / resume_reading() continue to work as earlier.Flow control will continue working absolutely the same for BufferedProtocols.> I have another question: what happens if there is a partial read?> For example, let's says I return a 1024-bytes buffer in get_buffer(), but recv_into() receives data in 512 chunks. Is it:It will be as follows:1. Protocol.get_buffer() is called, returns 1024 bytes buffer2. recv_into() receives 512 bytes, writes them in buf[0:512]3. Protocol.buffer_updated() is called with nbytes=512Now it's the responsibility of the Protocol to return a correct view over buffer the next time `get_buffer()` is called.The general idea is to:1. allocate a big buffer2. keep track of how much data we have in that buffer, let's say we have a 'length' integer for that.3. when get_buffer() is called, return 'memoryview(big_buffer)[length:]'4. when buffer_updated(nbytes) is called, do 'length += nbytes; parse_buffer_if_possible()'I've implemented precisely this approach here:https://gist.github.com/1st1/1c606e5b83ef0e9c41faf21564d75ad7#file-get_buffer_bench-py-L27-L43 | |||
| msg307857 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2017-12-08 17:31 | |
Do you think it's possible to fold BufferedProtocol into Protocol?i.e., either `get_buffer()` returns None (the default) and `data_received()` is called with a bytes object, or `get_buffer()` returns a writable buffer and `buffer_updated()` is called with the number of bytes received into the buffer.This would allow StreamReader to implement readinto(). | |||
| msg307859 -(view) | Author: Yury Selivanov (yselivanov)*![]() | Date: 2017-12-08 17:34 | |
> Do you think it's possible to fold BufferedProtocol into Protocol?It would be a backwards incompatible change :(Coincidentally, there might be protocols that already implement 'get_buffer()' and 'buffer_updated()' methods that do something completely different.> This would allow StreamReader to implement readinto().We can easily refactor StreamReader to use 'BufferedProtocol'. Methods like 'readexactly()' would benefit from that. | |||
| msg307861 -(view) | Author: Andrew Svetlov (asvetlov)*![]() | Date: 2017-12-08 17:57 | |
New protocol may speed up not only `reader.readexactly()`.Every `reader.feed_data()` appends a new chunk to existing buffer.We can try to read into unused tail of the buffer instead. | |||
| msg307862 -(view) | Author: Łukasz Langa (lukasz.langa)*![]() | Date: 2017-12-08 18:23 | |
+1 on the idea, I would use this. | |||
| msg308019 -(view) | Author: Inada Naoki (methane)*![]() | Date: 2017-12-11 08:19 | |
Looks nice. Can it speed up aiohttp too? | |||
| msg308020 -(view) | Author: Andrew Svetlov (asvetlov)*![]() | Date: 2017-12-11 08:46 | |
Yes.aiohttp uses own streams but public API and internal implementation are pretty close to asyncio streams.Moreover C accelerated HTTP parser should work with proposed BufferedProtocol seamlessly. | |||
| msg308027 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2017-12-11 10:47 | |
Seehttps://eklitzke.org/goroutines-nonblocking-io-and-memory-usage for an interesting discussion of the drawbacks of some buffer handling idioms. | |||
| msg308059 -(view) | Author: Yury Selivanov (yselivanov)*![]() | Date: 2017-12-11 16:46 | |
> Seehttps://eklitzke.org/goroutines-nonblocking-io-and-memory-usage for an interesting discussion of the drawbacks of some buffer handling idioms.Thanks for the link!It does make sense to use a pool of buffers for the proposed BufferedProtocol when you need to keep thousands of long-open connections. The current design makes that easy: when BufferedProtocol.get_buffer() is called you either take a buffer from the pool or allocate a temporary new one.For use-cases like DB connections (asyncpg) a permanently allocated buffer per protocol instance is a good solution too, as usually there's a fairly limited number of open DB connections. | |||
| msg308060 -(view) | Author: Yury Selivanov (yselivanov)*![]() | Date: 2017-12-11 16:50 | |
>> Looks nice. Can it speed up aiohttp too?> Yes.> aiohttp uses own streams but public API and internal implementation are pretty close to asyncio streams.> Moreover C accelerated HTTP parser should work with proposed BufferedProtocol seamlessly.I did some benchmarks, and it looks like BufferedProtocol can make httptools up to 5% faster for relatively small requests < 10Kb. Don't expect big speedups there.For asyncpg, benchmarks that fetch a lot of data (50-100Kb) get faster up to 15%. So we'll definitely use the BufferedProtocol in asyncpg.For applications that need to handle megabytes of data per request (like Antoine's benchmark) the speedup will be up to 2x. | |||
| msg311002 -(view) | Author: Yury Selivanov (yselivanov)*![]() | Date: 2018-01-28 21:30 | |
New changeset631fd38dbf04dbf0127881f3977982e401a849e4 by Yury Selivanov in branch 'master':bpo-32251: Implement asyncio.BufferedProtocol. (#4755)https://github.com/python/cpython/commit/631fd38dbf04dbf0127881f3977982e401a849e4 | |||
| msg311055 -(view) | Author: Yury Selivanov (yselivanov)*![]() | Date: 2018-01-29 04:51 | |
New changeset07627e9a6a5f418354ff3dc99a0f36bc5b79dcd8 by Yury Selivanov in branch 'master':bpo-32251: Fix docs (#5408)https://github.com/python/cpython/commit/07627e9a6a5f418354ff3dc99a0f36bc5b79dcd8 | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:55 | admin | set | github: 76432 |
| 2018-10-12 21:53:18 | cheryl.sabella | link | issue32052 dependencies |
| 2018-01-29 04:51:11 | yselivanov | set | messages: +msg311055 |
| 2018-01-29 04:35:34 | yselivanov | set | pull_requests: +pull_request5241 |
| 2018-01-28 21:43:53 | yselivanov | set | status: open -> closed type: enhancement resolution: fixed stage: patch review -> resolved |
| 2018-01-28 21:30:29 | yselivanov | set | messages: +msg311002 |
| 2017-12-11 16:50:18 | yselivanov | set | messages: +msg308060 |
| 2017-12-11 16:46:57 | yselivanov | set | messages: +msg308059 |
| 2017-12-11 10:47:14 | pitrou | set | messages: +msg308027 |
| 2017-12-11 08:46:34 | asvetlov | set | messages: +msg308020 |
| 2017-12-11 08:19:01 | methane | set | nosy: +methane messages: +msg308019 |
| 2017-12-08 18:23:45 | lukasz.langa | set | nosy: +lukasz.langa messages: +msg307862 |
| 2017-12-08 18:05:20 | gvanrossum | set | nosy: -gvanrossum |
| 2017-12-08 17:57:44 | asvetlov | set | messages: +msg307861 |
| 2017-12-08 17:34:45 | yselivanov | set | messages: +msg307859 |
| 2017-12-08 17:31:28 | pitrou | set | messages: +msg307857 |
| 2017-12-08 17:26:04 | yselivanov | set | messages: +msg307856 |
| 2017-12-08 10:51:32 | pitrou | set | messages: +msg307846 |
| 2017-12-08 09:34:26 | asvetlov | set | messages: +msg307841 |
| 2017-12-08 02:48:22 | yselivanov | set | messages: +msg307832 |
| 2017-12-08 02:25:23 | yselivanov | set | keywords: +patch stage: patch review pull_requests: +pull_request4658 |
| 2017-12-08 02:10:39 | yselivanov | set | messages: +msg307830 |
| 2017-12-08 02:04:18 | yselivanov | create | |