- Notifications
You must be signed in to change notification settings - Fork294
Update docs#332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Update docs#332
Uh oh!
There was an error while loading.Please reload this page.
Conversation
It runs together in the built HTML.
It's not much use if it's private.
Run "tox -e doc" to build the documentation in doc/_build.
Right now the docs have entries for re-exports likehtml5lib.__init__.HTMLParser, including full class documentation. Thisis redundant with the docs for html5lib.html5parser.HTMLParser, whichis a public name anyway, so I think that it is best to be explicit thatthis is a re-export.
HTMLTokenizer is now a private API (I cannot find a public export).HTMLSanitizer no longer exists as a tokenizer, and has been replacedwith a filter.
The sanitizer still exists, but it got rewritten as a filter and is now in Just a drive-by comment on the off chance it helps. |
@willkg Yup, and the new form is documented. The old docs just weren't removed. |
Still lots more to do, as html5lib's sanitization likes to escape tagsinstead of dropping them.I ended up fixing up the html5lib docs while working on this:html5lib/html5lib-python#332
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm really sorry this has been sitting around so long. I appreciate you working on it and fixing so many issues!
I have some comments. If you have time, can you look through them? If not, let me know and I can work through them.
Thank you! Looking forward to landing this!
html5lib/__init__.py Outdated
@@ -19,7 +28,8 @@ | |||
from .serializer import serialize | |||
__all__ = ["HTMLParser", "parse", "parseFragment", "getTreeBuilder", | |||
"getTreeWalker", "serialize"] | |||
"getTreeWalker", "serialize", "__version__"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I don't think we want__version__
to get imported when importing*
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
As__version__
is part of the public interface of this module, which__all__
defines, I think it should be here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I wouldn't think about__all__
that way. It's really for specifying what gets exported if the user doesfrom html5lib import *
. I don't want__version__
to get exported in that scenario. Please undo this change.
html5lib/__init__.py Outdated
# this has to be at the top level, see how setup.py parses this | ||
#: Distribution version number, which asymptotically approaches 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'd remove this. Amongst other things, it's not going to be true for much longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thank goodness! Will do.
@@ -110,11 +105,11 @@ You can alter the stream content with filters provided by html5lib: | |||
the document | |||
* :class:`lint.Filter <html5lib.filters.lint.Filter>` raises | |||
``LintError`` exceptions on invalid tag and attribute names, invalid | |||
:exc:`AssertionError` exceptions on invalid tag and attribute names, invalid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Is it really an AssertionError? If so, we should write up an issue to change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yeah, the implementation is basically allassert
statements:
html5lib-python/html5lib/filters/lint.py
Lines 24 to 79 in7bbde54
assertnamespaceisNoneorisinstance(namespace,text_type) | |
assertnamespace!="" | |
assertisinstance(name,text_type) | |
assertname!="" | |
assertisinstance(token["data"],dict) | |
if (notnamespaceornamespace==namespaces["html"])andnameinvoidElements: | |
asserttype=="EmptyTag" | |
else: | |
asserttype=="StartTag" | |
iftype=="StartTag"andself.require_matching_tags: | |
open_elements.append((namespace,name)) | |
for (namespace,name),valueintoken["data"].items(): | |
assertnamespaceisNoneorisinstance(namespace,text_type) | |
assertnamespace!="" | |
assertisinstance(name,text_type) | |
assertname!="" | |
assertisinstance(value,text_type) | |
eliftype=="EndTag": | |
namespace=token["namespace"] | |
name=token["name"] | |
assertnamespaceisNoneorisinstance(namespace,text_type) | |
assertnamespace!="" | |
assertisinstance(name,text_type) | |
assertname!="" | |
if (notnamespaceornamespace==namespaces["html"])andnameinvoidElements: | |
assertFalse,"Void element reported as EndTag token: %(tag)s"% {"tag":name} | |
elifself.require_matching_tags: | |
start=open_elements.pop() | |
assertstart== (namespace,name) | |
eliftype=="Comment": | |
data=token["data"] | |
assertisinstance(data,text_type) | |
eliftypein ("Characters","SpaceCharacters"): | |
data=token["data"] | |
assertisinstance(data,text_type) | |
assertdata!="" | |
iftype=="SpaceCharacters": | |
assertdata.strip(spaceCharacters)=="" | |
eliftype=="Doctype": | |
name=token["name"] | |
assertnameisNoneorisinstance(name,text_type) | |
asserttoken["publicId"]isNoneorisinstance(name,text_type) | |
asserttoken["systemId"]isNoneorisinstance(name,text_type) | |
eliftype=="Entity": | |
assertisinstance(token["name"],text_type) | |
eliftype=="SerializerError": | |
assertisinstance(token["data"],text_type) | |
else: | |
assertFalse,"Unknown token type: %(type)s"% {"type":type} |
doc/movingparts.rst Outdated
Filters | ||
~~~~~~~ | ||
You can alter the stream content withfilters provided by html5lib: | ||
html5lib provides severalfilters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Given that what follows is a bulleted list, can you add a:
to the end of this line?
doc/movingparts.rst Outdated
* :class:`~html5lib.serializer.HTMLSerializer`, to generate a stream of bytes; and | ||
* filters, to manipulate the token stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The leader has "a few tools", but this list has two items. Further, the items seem like a sentence. I'd either change the bullet list into a sentence or unsentencify the bullet items. Maybe something like this?:
html5lib provides two tools for consuming token streams:* :class:`~html5lib.serializer.HTMLSerializer` for generating a stream of bytes* filters for manipulating the token stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I ended up going with a sentence, rather than the bulleted list, as there aren't exactly two (really there's the serializer and a bunch of filters), so it's really twocategories of token consumer, but that distinction isn't really useful to point out.
@@ -1,13 +1,8 @@ | |||
html5lib Package | |||
================ | |||
:mod:`html5lib` Package | |||
----------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why take the header out here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Otherwise there are two headers in a row with exactly the same text.
reST doesn't support nesting inline markup, so this shows up inrendered for with the backticks.
I think that I have addressed all the issues you noted as appropriate. Please let me know if anything else is required! Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I have one outstanding issue with__version__
being listed in__all__
. Otherwise this is good to go. Thank you!
@twm Thank you so much for this! |
When trying to figure out how to sanitize some HTML I noticed that the docs have lagged behind the implementation. So I removed the mention of
HTMLSanitizer
and then... things got out of hand. Summary of changes:tox -e doc
environment so that it's easy to build the docs.__version__
tohtml5lib.__all__
html5lib.treeadapters.__all__
(as this caused Sphinx warnings and didn't really make sense).I didn't squash as suggested in CONTRIBUTING.md because recent PRs don't seem to be following that procedure. I am happy to do so if you like, or to try to break this into smaller PRs.