- Notifications
You must be signed in to change notification settings - Fork294
Open
Labels
Description
The etree walker with implementationlxml.etree
doesn't work when passed a full html document (having typelxml.etree._ElementTree
).
To reproduce--
def serialize(element, treebuilder, implementation=None): walker_cls = html5lib.getTreeWalker(treebuilder, implementation=implementation) walker = walker_cls(element) serializer = HTMLSerializer(omit_optional_tags=False) html = serializer.render(walker) print(html)html = """<!DOCTYPE html><html><head> <title>foo</title></head><body> <p>a</p><p>b</p></body></html>"""builder = html5lib.getTreeBuilder('lxml')parser = html5lib.HTMLParser(builder, namespaceHTMLElements=False)element = parser.parse(html)serialize(element, 'lxml')serialize(element, 'etree', implementation=lxml.etree)
The last line fails with the following error:
Traceback (most recent call last): File "test-html5lib.py", line 98, in <module> parse_and_serialize(element, 'etree', implementation=lxml.etree) File "test-html5lib.py", line 79, in serialize html = serializer.render(walker) File "/.../python3.6/site-packages/html5lib/serializer.py", line 323, in render return "".join(list(self.serialize(treewalker))) File "/.../python3.6/site-packages/html5lib/serializer.py", line 209, in serialize for token in treewalker: File "/.../python3.6/site-packages/html5lib/treewalkers/base.py", line 128, in __iter__ firstChild = self.getFirstChild(currentNode) File "/.../python3.6/site-packages/html5lib/treewalkers/etree.py", line 88, in getFirstChild if element.text:AttributeError: 'lxml.etree._ElementTree' object has no attribute 'text'
The walker should probably first be callingroot = element.getroot()
. This seems to be on the same wave length as the issue withtreewalkers/etree.py
I described in this comment:#338 (comment)