- Notifications
You must be signed in to change notification settings - Fork294
Closed
Description
http://code.google.com/p/html5lib/issues/detail?id=200
Reported by vovanec, Mar 6, 2012
A simple test case(my program has more complex handler implementation but the problem is reproducible with the default handler):
importxml.sax.handlerimporthtml5libdeftest(html):handler=xml.sax.handler.ContentHandler()parser=html5lib.HTMLParser(tree=html5lib.treebuilders.getTreeBuilder('dom'))dom=parser.parse(html)html5lib.treebuilders.dom.dom2sax(dom,handler)html='<html xml:lang="en">'test(html)
With html5lib 0.95 it produces the following traceback:
python test.py Traceback (most recent call last): File "test.py", line 13, in <module> test(html) File "test.py", line 10, in test html5lib.treebuilders.dom.dom2sax(dom, handler) File "/home/vkuznets/packages/html5lib-0.95/html5lib-0.95/html5lib/treebuilders/dom.py", line 271, in dom2sax for child in node.childNodes: dom2sax(child, handler, nsmap) File "/home/vkuznets/packages/html5lib-0.95/html5lib-0.95/html5lib/treebuilders/dom.py", line 256, in dom2sax del attributes[(attr.namespaceURI, attr.nodeName)]KeyError: (None, u'xml:lang')
With previous versions(at least 0.11) there's no any error. I assume this attribute may be invalid in the xml namespace, but anyway I don't think it is ok for parser just to crash. I've seen A LOT of html documents that has such attribute in the real world.
Tested it with Python 2.6.5, Linux
Please advise.
Thanks,
--Vladimir