html5lib/html5lib-pythonPublic

NotificationsYou must be signed in to change notification settings
Fork295
Star1.2k

Update docs#332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

willkg merged 22 commits intohtml5lib:masterfromtwm:update-docs

Nov 6, 2017

Merged

Update docs#332

willkg merged 22 commits intohtml5lib:masterfromtwm:update-docs

Nov 6, 2017

Conversation

Copy link

Contributor

twm commentedApr 15, 2017

When trying to figure out how to sanitize some HTML I noticed that the docs have lagged behind the implementation. So I removed the mention ofHTMLSanitizer and then... things got out of hand. Summary of changes:

Addtox -e doc environment so that it's easy to build the docs.
Remove documentation of HTMLTokenizer, as it is now private.
Remove documentation of HTMLSanitizer, as it no longer exists.
Add basic documentation of the treeadapters package.
Linkify many references to classes, functions, and modules.
Fix various Sphinx warnings and formatting issues.
Add__version__ tohtml5lib.__all__
Remove sub-modules fromhtml5lib.treeadapters.__all__ (as this caused Sphinx warnings and didn't really make sense).

I didn't squash as suggested in CONTRIBUTING.md because recent PRs don't seem to be following that procedure. I am happy to do so if you like, or to try to break this into smaller PRs.

twm added14 commits

April 15, 2017 08:22

Fix formatting of docstring example

224d9f4

It runs together in the built HTML.

Use with, it's idiomatic

3fb6af3

Fix typo in changelog

ba63e09

Export and document html5lib.__version__

6b99d52

It's not much use if it's private.

Add a documentation env to tox.ini

323d736

Run "tox -e doc" to build the documentation in doc/_build.

Clean up html5lib module documentation

964d0e1

Right now the docs have entries for re-exports likehtml5lib.__init__.HTMLParser, including full class documentation. Thisis redundant with the docs for html5lib.html5parser.HTMLParser, whichis a public name anyway, so I think that it is best to be explicit thatthis is a re-export.

Remove docs for HTMLTokenizer and HTMLSanitizer

abf6224

HTMLTokenizer is now a private API (I cannot find a public export).HTMLSanitizer no longer exists as a tokenizer, and has been replacedwith a filter.

Fix Sphinx title underline warnings

8554098

Open in binary mode for Python 3

c8fca0e

Update and expand "moving parts" doc

637826f

Add treeadapters package doc

254fc90

Remove duplicate header

deb4206

Link to the spec

2909867

Add myself to AUTHORS

739dcf0

Copy link

Contributor

willkg commentedApr 15, 2017

The sanitizer still exists, but it got rewritten as a filter and is now inhtml5lib.filters.sanitizer. You can see it here:

https://github.com/html5lib/html5lib-python/blob/17499b9763a090f7715af49555d21fe4b558958b/html5lib/filters/sanitizer.py

Just a drive-by comment on the off chance it helps.

Copy link

ContributorAuthor

twm commentedApr 15, 2017

@willkg Yup, and the new form is documented. The old docs just weren't removed.

twm added a commit to twm/yarrharr that referenced this pull request

Jul 24, 2017

Preprocess and sanitize with html5lib

1625edb

Still lots more to do, as html5lib's sanitization likes to escape tagsinstead of dropping them.I ended up fixing up the html5lib docs while working on this:html5lib/html5lib-python#332

willkg added this to the1.0 milestone

Oct 3, 2017

willkg added2 commits

October 31, 2017 14:11

Merge branch 'master' into update-docs

cbaf304

Merge branch 'master' into update-docs

fc69044

willkg reviewed

Oct 31, 2017

View reviewed changes

Copy link

Contributor

willkg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm really sorry this has been sitting around so long. I appreciate you working on it and fixing so many issues!

I have some comments. If you have time, can you look through them? If not, let me know and I can work through them.

Thank you! Looking forward to landing this!

html5lib/__init__.py Outdated

		@@ -19,7 +28,8 @@
		from .serializer import serialize

		__all__ = ["HTMLParser", "parse", "parseFragment", "getTreeBuilder",
		"getTreeWalker", "serialize"]
		"getTreeWalker", "serialize", "__version__"]

Copy link

Contributor

willkgOct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think we want__version__ to get imported when importing*.

Copy link

ContributorAuthor

twmNov 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As__version__ is part of the public interface of this module, which__all__ defines, I think it should be here.

Copy link

Contributor

willkgNov 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wouldn't think about__all__ that way. It's really for specifying what gets exported if the user doesfrom html5lib import *. I don't want__version__ to get exported in that scenario. Please undo this change.

html5lib/__init__.py Outdated


		# this has to be at the top level, see how setup.py parses this
		#: Distribution version number, which asymptotically approaches 1.

Copy link

Contributor

willkgOct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'd remove this. Amongst other things, it's not going to be true for much longer.

Copy link

ContributorAuthor

twmNov 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank goodness! Will do.

doc/movingparts.rst

		@@ -110,11 +105,11 @@ You can alter the stream content with filters provided by html5lib:
		the document

		* :class:`lint.Filter <html5lib.filters.lint.Filter>` raises
		``LintError`` exceptions on invalid tag and attribute names, invalid
		:exc:`AssertionError` exceptions on invalid tag and attribute names, invalid

Copy link

Contributor

willkgOct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is it really an AssertionError? If so, we should write up an issue to change that.

Copy link

ContributorAuthor

twmNov 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yeah, the implementation is basically allassert statements:

html5lib-python/html5lib/filters/lint.py

Lines 24 to 79 in7bbde54

	assertnamespaceisNoneorisinstance(namespace,text_type)
	assertnamespace!=""
	assertisinstance(name,text_type)
	assertname!=""
	assertisinstance(token["data"],dict)
	if (notnamespaceornamespace==namespaces["html"])andnameinvoidElements:
	asserttype=="EmptyTag"
	else:
	asserttype=="StartTag"
	iftype=="StartTag"andself.require_matching_tags:
	open_elements.append((namespace,name))
	for (namespace,name),valueintoken["data"].items():
	assertnamespaceisNoneorisinstance(namespace,text_type)
	assertnamespace!=""
	assertisinstance(name,text_type)
	assertname!=""
	assertisinstance(value,text_type)

	eliftype=="EndTag":
	namespace=token["namespace"]
	name=token["name"]
	assertnamespaceisNoneorisinstance(namespace,text_type)
	assertnamespace!=""
	assertisinstance(name,text_type)
	assertname!=""
	if (notnamespaceornamespace==namespaces["html"])andnameinvoidElements:
	assertFalse,"Void element reported as EndTag token: %(tag)s"% {"tag":name}
	elifself.require_matching_tags:
	start=open_elements.pop()
	assertstart== (namespace,name)

	eliftype=="Comment":
	data=token["data"]
	assertisinstance(data,text_type)

	eliftypein ("Characters","SpaceCharacters"):
	data=token["data"]
	assertisinstance(data,text_type)
	assertdata!=""
	iftype=="SpaceCharacters":
	assertdata.strip(spaceCharacters)==""

	eliftype=="Doctype":
	name=token["name"]
	assertnameisNoneorisinstance(name,text_type)
	asserttoken["publicId"]isNoneorisinstance(name,text_type)
	asserttoken["systemId"]isNoneorisinstance(name,text_type)

	eliftype=="Entity":
	assertisinstance(token["name"],text_type)

	eliftype=="SerializerError":
	assertisinstance(token["data"],text_type)

	else:
	assertFalse,"Unknown token type: %(type)s"% {"type":type}

doc/movingparts.rst Outdated



		Filters
		~~~~~~~

		You can alter the stream content withfilters provided by html5lib:
		html5lib provides severalfilters

Copy link

Contributor

willkgOct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Given that what follows is a bulleted list, can you add a: to the end of this line?

doc/movingparts.rst Outdated


		* :class:`~html5lib.serializer.HTMLSerializer`, to generate a stream of bytes; and
		* filters, to manipulate the token stream.

Copy link

Contributor

willkgOct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The leader has "a few tools", but this list has two items. Further, the items seem like a sentence. I'd either change the bullet list into a sentence or unsentencify the bullet items. Maybe something like this?:

html5lib provides two tools for consuming token streams:* :class:`~html5lib.serializer.HTMLSerializer` for generating a stream of bytes* filters for manipulating the token stream

Copy link

ContributorAuthor

twmNov 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I ended up going with a sentence, rather than the bulleted list, as there aren't exactly two (really there's the serializer and a bunch of filters), so it's really twocategories of token consumer, but that distinction isn't really useful to point out.

doc/html5lib.rst

		@@ -1,13 +1,8 @@
		html5lib Package
		================

		:mod:`html5lib` Package
		-----------------------

Copy link

Contributor

willkgOct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why take the header out here?

Copy link

ContributorAuthor

twmNov 2, 2017•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Otherwise there are two headers in a row with exactly the same text.

twm added5 commits

November 1, 2017 20:01

Add missing colon

f25d7c0

Rework token stream intro

5eb89cc

Merge remote-tracking branch 'upstream/master' into update-docs

d270666

Remove textual backticks in changelog

cb2702c

reST doesn't support nesting inline markup, so this shows up inrendered for with the backticks.

Asymptote no more

1084ed0

Copy link

ContributorAuthor

twm commentedNov 2, 2017

I think that I have addressed all the issues you noted as appropriate. Please let me know if anything else is required! Thanks!

willkg requested changes

Nov 2, 2017

View reviewed changes

Copy link

Contributor

willkg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I have one outstanding issue with__version__ being listed in__all__. Otherwise this is good to go. Thank you!

Remove __version__ from __all__

deb98bb

willkg approved these changes

Nov 6, 2017

View reviewed changes

willkg merged commit69606e5 intohtml5lib:master

Nov 6, 2017

Copy link

Contributor

willkg commentedNov 6, 2017

@twm Thank you so much for this!

willkg mentioned this pull request

Nov 9, 2017

Links to tokenizer.py and sanitizer.py broken#313

Closed

hugovk mentioned this pull request

Dec 4, 2017

update CHANGES.rst and AUTHORS.rst for 1.0 release#382

Closed

Labels

None yet

2 participants

Movatterモバイル変換

Update docs#332

Update docs#332

Uh oh!

Conversation

twm commentedApr 15, 2017

Uh oh!

willkg commentedApr 15, 2017

Uh oh!

twm commentedApr 15, 2017

Uh oh!

willkg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

twmNov 2, 2017• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

twm commentedNov 2, 2017

Uh oh!

willkg left a comment

Choose a reason for hiding this comment

Uh oh!

willkg commentedNov 6, 2017

Uh oh!

Uh oh!

twmNov 2, 2017•
edited
Loading