- Notifications
You must be signed in to change notification settings - Fork294
Description
Assertion failures in Python 2 from the etree treewalker.
If I create an element directly using cElementTree and try to serialise the result using html5lib, I get assertion failures in Python 2 unless I go to special lengths to make sure cElementTree sees unicode strings everywhere.
fromxml.etreeimportcElementTreeasetreeimporthtml5libdoc=html5lib.parse(u"<p>test",treebuilder="etree",namespaceHTMLElements=False)head=doc.find("head")link=etree.Element("link")head.append(link)stream=html5lib.treewalkers.getTreeWalker("etree")(doc)serializer=html5lib.serializer.htmlserializer.HTMLSerializer()rendered=serializer.render(stream)
The render() call fails with:
AssertionError: <type 'str'>
html5lib/treewalkers/etree.py:61 (getNodeDetails)
failing line:assert type(node.tag) == text_type, type(node.tag)
Using unicode string literals everywhere isn't enough to avoid trouble because cElementTree sometimes constructs attribute names from keyword arguments, eg:
doc=html5lib.parse(u"<p>test",treebuilder="etree",namespaceHTMLElements=False)head=doc.find("head")link=etree.Element(u"link",rel=u"stylesheet")head.append(link)stream=html5lib.treewalkers.getTreeWalker("etree")(doc)serializer=html5lib.serializer.htmlserializer.HTMLSerializer()rendered=serializer.render(stream)
The render() call fails with:
AssertionError
html5lib/serializer/htmlserializer.py:165 (encodeStrict)
failing line:assert(isinstance(string, text_type))