Movatterモバイル変換

Issue32251

➜

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/76432

classification

Title:	Add asyncio.BufferedProtocol
Type:	enhancement	Stage:	resolved
Components:	asyncio	Versions:	Python 3.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	yselivanov	Nosy List:	asvetlov, lukasz.langa, methane, pitrou, yselivanov
Priority:	normal	Keywords:	patch

Created on2017-12-08 02:04 byyselivanov, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.

Pull Requests
URL	Status	Linked	Edit
PR 4755	merged	yselivanov,2017-12-08 02:25
PR 5408	merged	yselivanov,2018-01-29 04:35

Messages (16)
msg307830 -(view)	Author: Yury Selivanov (yselivanov)*	Date: 2017-12-08 02:10
A couple emails from async-sig for the context:1.https://mail.python.org/pipermail/async-sig/2017-October/000392.html2.https://mail.python.org/pipermail/async-sig/2017-December/000423.htmlI propose to add another Protocol base class to asyncio: BufferedProtocol. It will have 'get_buffer()' and 'buffer_updated(nbytes)' methods instead of 'data_received()': class asyncio.BufferedProtocol: def get_buffer(self) -> memoryview: pass def buffer_updated(self, nbytes: int): passWhen the protocol's transport is ready to receive data, it will call `protocol.get_buffer()`. The latter must return an object that implements the buffer protocol. The transport will request a writable buffer over the returned object and receive data into that buffer.When the `sock.recv_into(buffer)` call is done, `protocol.buffer_updated(nbytes)` method will be called. The number of bytes received into the buffer will be passed as a first argument.I've implemented the proposed design in uvloop (branch 'get_buffer', [1]) and adjusted your benchmark [2] to use it. Here are benchmark results from my machine (macOS):vanilla asyncio: 120-135 Mb/suvloop: 320-330 Mb/suvloop/get_buffer: 600-650 Mb/s.[1] https://github.com/MagicStack/uvloop/tree/get_buffer[2] https://gist.github.com/1st1/1c606e5b83ef0e9c41faf21564d75ad7
msg307832 -(view)	Author: Yury Selivanov (yselivanov)*	Date: 2017-12-08 02:48
I've made a PR that implements the change for selector_events.py.With the change:vanilla asyncio: 120-135 Mb/svanilla asyncio/get_buffer: 220-230 Mb/suvloop: 320-330 Mb/suvloop/get_buffer: 600-650 Mb/sIf we decide to go forward with the proposed design, I'll update the PR with support for proactor_events
msg307841 -(view)	Author: Andrew Svetlov (asvetlov)*	Date: 2017-12-08 09:34
Numbers are great!Couple questions.1. What happens if size of read data is greater than pre-allocated buffer?2. Is flow control logic changed or not? If I understand correctly pause_reading() / resume_reading() continue to work as earlier.
msg307846 -(view)	Author: Antoine Pitrou (pitrou)*	Date: 2017-12-08 10:51
I have another question: what happens if there is a partial read?For example, let's says I return a 1024-bytes buffer in get_buffer(), but recv_into() receives data in 512 chunks. Is it:1. getbuffer() is called, returns 1024 bytes buffer2. recv_into() receives 512 bytes, writes them in buf[0:512]3. recv_into() receives another 512 bytes, writes them in buf[512:1024]or is it:1. getbuffer() is called, returns 1024 bytes buffer2. recv_into() receives 512 bytes, writes them in buf[0:512]3. getbuffer() is called, returns another 1024 bytes buffer4. recv_into() receives 512 bytes, writes them in newbuf[0:512]
msg307856 -(view)	Author: Yury Selivanov (yselivanov)*	Date: 2017-12-08 17:26
> 1. What happens if size of read data is greater than pre-allocated buffer?Let's say we have 2Kb of data in the socket's network buffer, and we only preallocated 0.5Kb in our buffer. We will the receive 0.5Kb in our buffer, 'Protocol.buffer_updated()' will be called with nbytes=512, and 1.5Kb of data will be left in the network buffer. So the loop will call get_buffer()/buffer_updated() again, and the cycle will continue until there's no data left.> 2. Is flow control logic changed or not? If I understand correctly pause_reading() / resume_reading() continue to work as earlier.Flow control will continue working absolutely the same for BufferedProtocols.> I have another question: what happens if there is a partial read?> For example, let's says I return a 1024-bytes buffer in get_buffer(), but recv_into() receives data in 512 chunks. Is it:It will be as follows:1. Protocol.get_buffer() is called, returns 1024 bytes buffer2. recv_into() receives 512 bytes, writes them in buf[0:512]3. Protocol.buffer_updated() is called with nbytes=512Now it's the responsibility of the Protocol to return a correct view over buffer the next time `get_buffer()` is called.The general idea is to:1. allocate a big buffer2. keep track of how much data we have in that buffer, let's say we have a 'length' integer for that.3. when get_buffer() is called, return 'memoryview(big_buffer)[length:]'4. when buffer_updated(nbytes) is called, do 'length += nbytes; parse_buffer_if_possible()'I've implemented precisely this approach here:https://gist.github.com/1st1/1c606e5b83ef0e9c41faf21564d75ad7#file-get_buffer_bench-py-L27-L43
msg307857 -(view)	Author: Antoine Pitrou (pitrou)*	Date: 2017-12-08 17:31
Do you think it's possible to fold BufferedProtocol into Protocol?i.e., either `get_buffer()` returns None (the default) and `data_received()` is called with a bytes object, or `get_buffer()` returns a writable buffer and `buffer_updated()` is called with the number of bytes received into the buffer.This would allow StreamReader to implement readinto().
msg307859 -(view)	Author: Yury Selivanov (yselivanov)*	Date: 2017-12-08 17:34
> Do you think it's possible to fold BufferedProtocol into Protocol?It would be a backwards incompatible change :(Coincidentally, there might be protocols that already implement 'get_buffer()' and 'buffer_updated()' methods that do something completely different.> This would allow StreamReader to implement readinto().We can easily refactor StreamReader to use 'BufferedProtocol'. Methods like 'readexactly()' would benefit from that.
msg307861 -(view)	Author: Andrew Svetlov (asvetlov)*	Date: 2017-12-08 17:57
New protocol may speed up not only `reader.readexactly()`.Every `reader.feed_data()` appends a new chunk to existing buffer.We can try to read into unused tail of the buffer instead.
msg307862 -(view)	Author: Łukasz Langa (lukasz.langa)*	Date: 2017-12-08 18:23
+1 on the idea, I would use this.
msg308019 -(view)	Author: Inada Naoki (methane)*	Date: 2017-12-11 08:19
Looks nice. Can it speed up aiohttp too?
msg308020 -(view)	Author: Andrew Svetlov (asvetlov)*	Date: 2017-12-11 08:46
Yes.aiohttp uses own streams but public API and internal implementation are pretty close to asyncio streams.Moreover C accelerated HTTP parser should work with proposed BufferedProtocol seamlessly.
msg308027 -(view)	Author: Antoine Pitrou (pitrou)*	Date: 2017-12-11 10:47
Seehttps://eklitzke.org/goroutines-nonblocking-io-and-memory-usage for an interesting discussion of the drawbacks of some buffer handling idioms.
msg308059 -(view)	Author: Yury Selivanov (yselivanov)*	Date: 2017-12-11 16:46
> Seehttps://eklitzke.org/goroutines-nonblocking-io-and-memory-usage for an interesting discussion of the drawbacks of some buffer handling idioms.Thanks for the link!It does make sense to use a pool of buffers for the proposed BufferedProtocol when you need to keep thousands of long-open connections. The current design makes that easy: when BufferedProtocol.get_buffer() is called you either take a buffer from the pool or allocate a temporary new one.For use-cases like DB connections (asyncpg) a permanently allocated buffer per protocol instance is a good solution too, as usually there's a fairly limited number of open DB connections.
msg308060 -(view)	Author: Yury Selivanov (yselivanov)*	Date: 2017-12-11 16:50
>> Looks nice. Can it speed up aiohttp too?> Yes.> aiohttp uses own streams but public API and internal implementation are pretty close to asyncio streams.> Moreover C accelerated HTTP parser should work with proposed BufferedProtocol seamlessly.I did some benchmarks, and it looks like BufferedProtocol can make httptools up to 5% faster for relatively small requests < 10Kb. Don't expect big speedups there.For asyncpg, benchmarks that fetch a lot of data (50-100Kb) get faster up to 15%. So we'll definitely use the BufferedProtocol in asyncpg.For applications that need to handle megabytes of data per request (like Antoine's benchmark) the speedup will be up to 2x.
msg311002 -(view)	Author: Yury Selivanov (yselivanov)*	Date: 2018-01-28 21:30
New changeset631fd38dbf04dbf0127881f3977982e401a849e4 by Yury Selivanov in branch 'master':bpo-32251: Implement asyncio.BufferedProtocol. (#4755)https://github.com/python/cpython/commit/631fd38dbf04dbf0127881f3977982e401a849e4
msg311055 -(view)	Author: Yury Selivanov (yselivanov)*	Date: 2018-01-29 04:51
New changeset07627e9a6a5f418354ff3dc99a0f36bc5b79dcd8 by Yury Selivanov in branch 'master':bpo-32251: Fix docs (#5408)https://github.com/python/cpython/commit/07627e9a6a5f418354ff3dc99a0f36bc5b79dcd8

History
Date	User	Action	Args
2022-04-11 14:58:55	admin	set	github: 76432
2018-10-12 21:53:18	cheryl.sabella	link	issue32052 dependencies
2018-01-29 04:51:11	yselivanov	set	messages: +msg311055
2018-01-29 04:35:34	yselivanov	set	pull_requests: +pull_request5241
2018-01-28 21:43:53	yselivanov	set	status: open -> closed type: enhancement resolution: fixed stage: patch review -> resolved
2018-01-28 21:30:29	yselivanov	set	messages: +msg311002
2017-12-11 16:50:18	yselivanov	set	messages: +msg308060
2017-12-11 16:46:57	yselivanov	set	messages: +msg308059
2017-12-11 10:47:14	pitrou	set	messages: +msg308027
2017-12-11 08:46:34	asvetlov	set	messages: +msg308020
2017-12-11 08:19:01	methane	set	nosy: +methane messages: +msg308019
2017-12-08 18:23:45	lukasz.langa	set	nosy: +lukasz.langa messages: +msg307862
2017-12-08 18:05:20	gvanrossum	set	nosy: -gvanrossum
2017-12-08 17:57:44	asvetlov	set	messages: +msg307861
2017-12-08 17:34:45	yselivanov	set	messages: +msg307859
2017-12-08 17:31:28	pitrou	set	messages: +msg307857
2017-12-08 17:26:04	yselivanov	set	messages: +msg307856
2017-12-08 10:51:32	pitrou	set	messages: +msg307846
2017-12-08 09:34:26	asvetlov	set	messages: +msg307841
2017-12-08 02:48:22	yselivanov	set	messages: +msg307832
2017-12-08 02:25:23	yselivanov	set	keywords: +patch stage: patch review pull_requests: +pull_request4658
2017-12-08 02:10:39	yselivanov	set	messages: +msg307830
2017-12-08 02:04:18	yselivanov	create

Supported byThe Python Software Foundation,
Powered byRoundup