Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

WIP More general fix for #127 with addinfourl#136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
SimonSapin wants to merge2 commits intohtml5lib:masterfromSimonSapin:first-chunk

Conversation

@SimonSapin
Copy link
Contributor

Do not merge yet, this is incomplete.

As discussed in#134 I changed this to avoid.read(0) entirely and pass a first chunk toHTMLUnicodeInputStream andHTMLBinaryInputStream, but then I get lost in their implementation and I don’t know what to do with that first chunk.

Surprisingly, most tests still passed because.seek(0) is used in some places. I added a failing test with a non-seekable input.

@hoppipolla-critic-bot

Critic review:https://critic.hoppipolla.co.uk/r/498

This is an external review system which you may optionally use for the code review of your pull request.

In order to help critic track your changes, please donot make in-place history rewrites (e.g. viagit rebase -i orgit commit --amend) when updating this pull request.

@gsnedders
Copy link
Member

Urgh, looking at this again I must've been thinking of my somewhat-working-written-a-long-time-ago Python 3-only rewrite of the input stream. Dealing withfirstChunk is going to be hideously complex.

Alternative idea for fixing this (comments welcome, don't take this as me saying what the right approach is — I don't know what the right approach is!): make BufferedStream generic and work for both unicode and bytes file-like objects (literally the only thing specific is theb"".join at the end!), make sure it hides bugs whereread is broken (so it guaranteeslen(s.read(n)) <= n), and then wrapall non-seekable streams with it inHTMLInputStream, which then meansread(0) is safe for all non-seekable streams.

@gsnedders
Copy link
Member

The other option is to try and finish my long-abandoned Python 3-only rewrite of inputstream, create something with the same API (but slower when changing character encoding) for Python 2.https://github.com/gsnedders/html5lib-python/tree/new-inputstream is what there is. Hopefully shows the vague idea.

@SimonSapin
Copy link
ContributorAuthor

At the momentBufferedStream is only used when the input stream is non-seekable, so it doesn’t cover all cases.

@gsnedders
Copy link
Member

Hmm, looking ataddinfourl, I wonder if we should just detect the bug withif isinstance(source, http_client.HTTPResponse) or source.read == http_client.HTTPResponse. This would handle both any cases of people subclassingHTTPResponse as well as handlingaddinfourl. Do we actually want to work around more general cases of people breaking the file-like object API contract? I'd argue not… This is only really special in that it is stdlib code breaking it.

@SimonSapin
Copy link
ContributorAuthor

At one point I hadif isinstance(six.get_method_self(source.read), http_client.HTTPResponse), but that broke on objects wheresource.read was not a method.

@gsnedders
Copy link
Member

Is there any reason not to just go for the equality check?

@SimonSapin
Copy link
ContributorAuthor

Do you meanif source.read == http_client.HTTPResponse.read? That doesn’t work,source.read is a bound method.

@gsnedders
Copy link
Member

So, what,get_method_self would raiseAttributeError, no? Can we not just check for that?

@SimonSapin
Copy link
ContributorAuthor

We can, it’s just one more corner case piling up…

@SimonSapin
Copy link
ContributorAuthor

How about this?

# Work around Python bug #20007: read(0) closes the connection.if (isinstance(source, http_client.HTTPResponse) or     # Also check for addinfourl wrapping HTTPResponse    isinstance(getattr(source, 'fp', None), http_client.HTTPResponse)):

It should never raise, and cover every case where we’ve seen 20007 so far.

@gsnedders
Copy link
Member

I guess it is as good as anything. As ugly as it is.

@gsnedders
Copy link
Member

I'm so confused as to what's happened here. Why did this get dropped? Was I expecting you to update PR with that? I don't know. Looking over this again,get_method_self seems sanest, along with moving it within the followingif. Closing this to replace it with#186.

@gsnedders
Copy link
Member

Oh, no, I'm being stupid. I'm thinking about JS semantics and not Python ones!

gsnedders added a commit to gsnedders/html5lib-python that referenced this pull requestApr 26, 2015
gsnedders added a commit to gsnedders/html5lib-python that referenced this pull requestDec 12, 2015
gsnedders added a commit to gsnedders/html5lib-python that referenced this pull requestDec 12, 2015
gsnedders added a commit to gsnedders/html5lib-python that referenced this pull requestDec 12, 2015
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@SimonSapin@hoppipolla-critic-bot@gsnedders

[8]ページ先頭

©2009-2025 Movatter.jp