- Notifications
You must be signed in to change notification settings - Fork294
Selected patches from Calibre#245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This seems to have broken 2.6 badly. Huh. |
codecov-io commentedMay 7, 2016 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Current coverage is89.15%@@ master #245 diff @@========================================== Files 51 50 -1 Lines 6817 6726 -91 Methods 0 0 Messages 0 0 Branches 1316 1307 -9 ==========================================- Hits 6172 5996 -176- Misses 485 559 +74- Partials 160 171 +11
|
Oh, right, this is |
kovidgoyal commentedMay 8, 2016 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
How do you suggest I override the application of attributes for html and body tags in my builder? Since without those patches, it would require overriding the entire getPhases() method in html5parser.py Remember that the problem those patches is solving is that there can be multiple If you dont want to mergegsnedders/html5lib-python@a2d2e05 then how do you suggest I replace the the stream input class? The one in html5lib is too slow. The only alternative I can see is monkey patching -- which is less than optimal for obvious reasons. |
@kovidgoyal I'll take a look at dealing with html/body attributes later (I'm literally amount to board a plane). When it comes to the input stream, if it yields good perf increases when given a byte/unicode object we should just specialise them in html5lib. |
Sure you are welcome to take the input stream class from calibre for dealing with unicode objects. It is faster because it avoids wrapping the unicode in StringIO. And it actually implements tracking of positions. For my use case, that is important, since I need line and col numbers. |
See#119. CC@kovidgoyal.
This cherry-picks a few things fromhttps://github.com/gsnedders/html5lib-python/commits/calibre-patches, which was a complete set of Calibre's patches from November 2013.https://github.com/kovidgoyal/calibre/commits/master/src/html5lib has very little changed in it since then, primarily a move to 0.999999-dev and a separate downstream fix for0c551c9.
So, of those on that branch…
True
/False
cases it's likely slower, therefore failing at its stated goal, as it results in more byte code andPOP_JUMP_IF_FALSE
andPOP_JUMP_IF_TRUE
special-case the condition beingTrue
orFalse
(oddly, they don't specialiseNone
, though it is inPyObject_IsTrue
; if that makes any notable performance difference then I'd suggest fixing that in CPython).