- Notifications
You must be signed in to change notification settings - Fork294
Closed
Description
>>>html5lib.serializer.serialize(html5lib.parse('<p> </p>'))'<p>\xa0'
at the moment the parsing and serialising a document causes entities to be converted into special characters, including things like #00 and there is no way to pass additional entities to xml.sax.saxutils.escape.
I looked into subclassing the serialiser but the escaping happens in the middle of the serialize() method at:
https://github.com/html5lib/html5lib-python/blob/master/html5lib/serializer/htmlserializer.py#L223
perhaps the class should define an entities dict to pass through the standard html5 entities and special characters or do the escaping via a class method that can be overridden?
Metadata
Metadata
Assignees
Labels
No labels