Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-81322: support multiple separators in Stream.readuntil#16429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
gvanrossum merged 4 commits intopython:mainfrombmerry:multi-separator
Apr 8, 2024

Conversation

bmerry
Copy link
Contributor

@bmerrybmerry commentedSep 26, 2019
edited by bedevere-appbot
Loading

Allow Stream.readuntil to take an iterable of separators and match any
of them. The earliest match endpoint wins (which ensures that results
are dependent on the chunking) and on ties shortest separator wins
(which only matters if the user has supplied a redundant set like
[b'\r\n', b'\n'] and the limit is reached).

It's also implemented for the deprecated StreamReader, just because the
code for the two implementations was the same except for one line and it
seemed easier to keep them in sync than leaving two different versions
to maintain.

https://bugs.python.org/issue37141

@bmerry
Copy link
ContributorAuthor

A few questions about this:

  • I've implemented it for both StreamReader and Stream, because it seemed like having two identical (apart from_ensure_can_read) implementations is better for maintainability than two implementations that have drifted out of sync. But if one would prefer not to make any changes to StreamReader I can revert that. I suggest doing review discussion on the Stream implementation.
  • Because the existing functionality is implemented using the general case there might be a very small loss of performance. Are there any benchmarks I should run to test that?
  • For matches that end in the same position I arbitrarily decided that shortest should win for the purpose of LimitOverrunError, but it was an arbitrary choice. It could just as easily be longest-wins or first-in-list wins.

break

# see upper comment for explanation.
offset = buflen + 1 -seplen
offset = buflen + 1 -max_seplen
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hmm, I see this can make offset negative if max_seplen > min_seplen. I'll add a test to catch it and add a fix.

Copy link
Member

@brandtbucherbrandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for taking the time to draft this PR@bmerry, and welcome to CPython! 😎

I'm notsuper familiar with this module, but I found a couple of things that could be tidied up. Other than that, the code looks good! This will definitely need updated docs, though.

# Makes sure shortest matches wins, and supports arbitrary iterables
separator = sorted(separator, key=lambda sep: len(sep))
if not separator:
raise ValueError('Separator list should contain at least one element')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

See above.

Suggested change
raiseValueError('Separatorlistshould contain at least one element')
raiseValueError('Separator should contain at least one element')

@@ -1672,26 +1709,35 @@ def _feed_data(self, data):
# messages :)

# `offset` is the number of bytes from the beginning of the buffer
# where there is no occurrence of`separator`.
# where there is no occurrence ofanyseparator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

See above.

Suggested change
# where there is no occurrence of any separator.
# where there is no occurrence of any`separator`.

offset = 0

# Loop until we find`separator` in the buffer, exceed the buffer size,
# Loop until we findaseparator in the buffer, exceed the buffer size,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

See above.

Suggested change
# Loop until we find a separator in the buffer, exceed the buffer size,
# Loop until we find a`separator` in the buffer, exceed the buffer size,

separator = [separator]
else:
# Makes sure shortest matches wins, and supports arbitrary iterables
separator = sorted(separator, key=lambda sep: len(sep))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sorry, missed one. Here too!

Suggested change
separator=sorted(separator,key=lambdasep:len(sep))
separator=sorted(separator,key=len)

Copy link
Member

@1st11st1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Unfortunately it was decided to revert the current streams implementation from 3.8. Seehttps://bugs.python.org/issue38242 for more details. I'm really sorry, but we'll need to rebase this work on the new API we later add to 3.9 :(

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phraseI have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes,you will be poked with soft cushions!

@bmerry
Copy link
ContributorAuthor

@1st1 not a problem - definitely worth getting things right before they go into stdlib. Do you think the revert of the new stream API will conflict much with my changes to the old StreamReader class? If not, perhaps the way forward is for me to remove my changes toStream, address the PR comments onStreamReader and then hopefully it shouldn't need much work to rebase afterStream is removed?

@1st1
Copy link
Member

@bmerry I'm not yet sure. I'll be working on a revert later today or tomorrow and will probably know more. I'd keep everything as is as it's not going to be an easy revert anyways.

@bmerry
Copy link
ContributorAuthor

@1st1 it looks like you've done the Streams reversion now, and I've merged that into my branch. So hopefully this is ready for review now.

@brandtbucher thanks for the suggestions. I've applied all the ones that Github was still showing after the merge from master - let me know if anything got lost along the way.

Copy link
Member

@brandtbucherbrandtbucher left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

One more tiny clarification from me, otherwise this looks good. Still needs updated docs though!

@bmerry
Copy link
ContributorAuthor

Still needs updated docs though!

@brandtbucher which docs need updating?

@bmerrybmerry requested a review from1st1October 7, 2019 15:51
@brandtbucher
Copy link
Member

Sorry about the delay@bmerry. The docs I'm referring to are located inDoc/library/asyncio-stream.rst. Specifically, here:

   ..coroutinemethod::readuntil(separator=b'\\n')      Read data from the stream until *separator* is found.      On success, the data and separator will be removed from the      internal buffer (consumed). Returned data will include the      separator at the end.      If the amount of data read exceeds the configured stream limit, a:exc:`LimitOverrunError` exception is raised, and the data      is left in the internal buffer and can be read again.      If EOF is reached before the complete separator is found,      an:exc:`IncompleteReadError` exception is raised, and the internal      buffer is reset.  The:attr:`IncompleteReadError.partial` attribute      may contain a portion of the separator.      ..versionadded::3.5.2

The signature and description should be updated. Also, something like this should be included below the.. versionadded:: 3.5.2 bit:

      ..versionchanged::3.9         The *separator* parameter may now be an:term:`iterable` of separators.

Copy link
Member

@brandtbucherbrandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

One tiny change to the NEWS entry due to the API revert:

@bmerry
Copy link
ContributorAuthor

The docs I'm referring to are located in Doc/library/asyncio-stream.rst

Thanks. I'd assumed the library docs were generated from the docstring, which is why I missed this. I'll update it.

@brandtbucher
Copy link
Member

Looks good to me!@1st1?

@1st1
Copy link
Member

1st1 commentedOct 9, 2019

We will soon start a discussion about the new streaming API for 3.9. I'll update this issue with a link when that happens; please give us some time before making a decision on this one -- Python 3.9 is still relatively far away.

brandtbucher reacted with thumbs up emoji

@gvanrossumgvanrossum changed the titlebpo-37141: support multiple separators in Stream.readuntilgh-81322: support multiple separators in Stream.readuntilApr 7, 2024
@gvanrossum
Copy link
Member

Thanks, I will review in the coming week!

Copy link
Member

@gvanrossumgvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

One comment nit, I'll merge that myself and then land it. Thanks for your contribution, and on behalf of the asyncio maintainers I apologize for the delay!

len(separator) would be the number of separators, which makes no sense.
@gvanrossumgvanrossum merged commit775912a intopython:mainApr 8, 2024
bmerry added a commit to bmerry/cpython that referenced this pull requestApr 8, 2024
PRpython#16429 introduced support for an iterable of separators inStream.readuntil. Since bytes-like types are themselves iterable, thiscan introduce ambiguities in deciding whether the argument is aniterator of separators or a singleton separator. Inpython#16429, only 'bytes'was considered a singleton, but this will break code that passes otherbuffer object types.The Python library docs don't indicate what separator types werepermitted in Python <=3.12, but comments in typeshed indicate that itwould work with types that implement the buffer protocol and provide alen(). To keep those cases working the way they did before, I've changedthe detection logic to consider any instance of collections.abc.Bufferas a singleton separator.There may still be corner cases where this doesn't do what the userwants e.g. a numpy array of byte strings will implement the bufferprotocol and hence be treated as a singleton; but at least those cornercases should behave the same in 3.13 as they did in 3.12.Relates topython#81322.
@bmerrybmerry deleted the multi-separator branchApril 8, 2024 19:09
bmerry added a commit to bmerry/cpython that referenced this pull requestApr 10, 2024
PRpython#16429 introduced support for an iterable of separators inStream.readuntil. Since bytes-like types are themselves iterable, thiscan introduce ambiguities in deciding whether the argument is aniterator of separators or a singleton separator. Inpython#16429, only 'bytes'was considered a singleton, but this will break code that passes otherbuffer object types.Fix it by only supporting tuples rather than arbitrary iterables.Closespython#117722.
gvanrossum pushed a commit that referenced this pull requestApr 11, 2024
gh-16429 introduced support for an iterable of separators inStream.readuntil. Since bytes-like types are themselves iterable, thiscan introduce ambiguities in deciding whether the argument is aniterator of separators or a singleton separator. Ingh-16429, only 'bytes'was considered a singleton, but this will break code that passes otherbuffer object types.Fix it by only supporting tuples rather than arbitrary iterables.Closesgh-117722.
diegorusso pushed a commit to diegorusso/cpython that referenced this pull requestApr 17, 2024
…ython#117723)pythongh-16429 introduced support for an iterable of separators inStream.readuntil. Since bytes-like types are themselves iterable, thiscan introduce ambiguities in deciding whether the argument is aniterator of separators or a singleton separator. Inpythongh-16429, only 'bytes'was considered a singleton, but this will break code that passes otherbuffer object types.Fix it by only supporting tuples rather than arbitrary iterables.Closespythongh-117722.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@1st11st11st1 requested changes

@gvanrossumgvanrossumgvanrossum approved these changes

@brandtbucherbrandtbucherbrandtbucher approved these changes

@willingcwillingcAwaiting requested review from willingcwillingc is a code owner

Assignees
No one assigned
Labels
topic-asynciotype-featureA feature request or enhancement
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

8 participants
@bmerry@bedevere-bot@1st1@brandtbucher@aeros@gvanrossum@the-knights-who-say-ni@ezio-melotti

[8]ページ先頭

©2009-2025 Movatter.jp