Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
Closed
Description
Bug report
Bug description:
WhenHTMLParser is initialized withconvert_charrefs=False, it behaves incorrectly when processing an invalid named entity reference (e.g.,&A, which is not a valid HTML entity). The parser silently drops the& character and only passes the subsequentA tohandle_data. I think this indicates a silent data loss problem.
fromhtml.parserimportHTMLParserclassMyParser(HTMLParser):defhandle_data(self,data):print(f"handle_data received:{data!r}")parser_false=MyParser(convert_charrefs=False)parser_false.feed('&A')parser_false.close()
handle_datareceived:'A'
CPython versions tested on:
3.12
Operating systems tested on:
Linux
Linked PRs
Metadata
Metadata
Assignees
Labels
Projects
Status
Done