Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue13960

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:Handling of broken comments in HTMLParser
Type:behaviorStage:resolved
Components:Library (Lib)Versions:Python 3.2, Python 3.3, Python 2.7
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To: ezio.melottiNosy List: eric.araujo, ezio.melotti, python-dev
Priority:normalKeywords:patch

Created on2012-02-07 11:56 byezio.melotti, last changed2022-04-11 14:57 byadmin. This issue is nowclosed.

Files
File nameUploadedDescriptionEdit
issue13960.diffezio.melotti,2012-02-07 11:56Patch against 3.2review
Messages (8)
msg152806 -(view)Author: Ezio Melotti (ezio.melotti)*(Python committer)Date: 2012-02-07 11:56
html.parser fails to handle the following invalid comments:<! foo ><! bar --><! -- baz -->The attached patch follows the HTML5 specs [0], and parses them as "bogus comments".  Currently the patch fixes the problem only when strict=False, but it might be better to make this the default behavior and apply it to 2.7 too.[0]:http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state
msg152861 -(view)Author: Éric Araujo (eric.araujo)*(Python committer)Date: 2012-02-08 14:28
LGTM.  What did our last discussion about following HTML5 rules for Python 2.7 lead to?  I don’t remember if we agreed that “3.3 is soon enough” or “let’s fix the bugs with HTML5 as reference”.
msg152869 -(view)Author: Éric Araujo (eric.araujo)*(Python committer)Date: 2012-02-08 15:30
After reading some emails again, I’m +1 on porting the fixes to 2.7.1) We agree that HTML5 is the reference specification.2) I don’t think there is sane code that would be broken if some previously unparsable page became parsable (an exception can be HTML parsers, but the obvious example BeautifulSoup does not use HTMLParser for example); HTMLParser is not a validating parser and never made any guarantee about the validity of handled pages.3) Most people should be happy to have more pages handled by HTMLParser.4) 2.7 is unique as long-term support, last 2.7 release.
msg153032 -(view)Author: Ezio Melotti (ezio.melotti)*(Python committer)Date: 2012-02-10 08:10
I'll fix this for 3.x non-strict and then see if it can be backported to 2.7 (there are still other fixes that should be backported to 2.7 before this can be applied).
msg153035 -(view)Author: Roundup Robot (python-dev)(Python triager)Date: 2012-02-10 08:51
New changeset242b697449d8 by Ezio Melotti in branch '3.2':#13960: HTMLParser is now able to handle broken comments when strict=False.http://hg.python.org/cpython/rev/242b697449d8New changeset44366541dd86 by Ezio Melotti in branch 'default':#13960: merge with 3.2.http://hg.python.org/cpython/rev/44366541dd86
msg153036 -(view)Author: Ezio Melotti (ezio.melotti)*(Python committer)Date: 2012-02-10 08:52
This is now fixed in 3.2/3.3, I'll wait for 2.7 before closing it.On a side note, the empty <!> comment doesn't seem to be valid in HTML5.HTMLParser just ignores it, and doesn't report it as an empty comment (so this should be fine).
msg153271 -(view)Author: Roundup Robot (python-dev)(Python triager)Date: 2012-02-13 14:10
New changeset333e3acf2008 by Ezio Melotti in branch '2.7':#13960: HTMLParser is now able to handle broken comments.http://hg.python.org/cpython/rev/333e3acf2008
msg153272 -(view)Author: Ezio Melotti (ezio.melotti)*(Python committer)Date: 2012-02-13 14:14
I now backported this to 2.7, together with some improvements in the handling of declarations that I committed on 3.2 (4c4ff9fd19b6) and 3.3 (06a6fed0da56).Apparently <!> is not a valid comment in HTML5, but it is considered a bogus comment and should still emit a "comment" with no content.  This is now fixed too.
History
DateUserActionArgs
2022-04-11 14:57:26adminsetgithub: 58168
2012-02-13 14:14:59ezio.melottisetstatus: open -> closed
resolution: fixed
messages: +msg153272

stage: patch review -> resolved
2012-02-13 14:10:58python-devsetmessages: +msg153271
2012-02-10 08:52:03ezio.melottisetmessages: +msg153036
2012-02-10 08:51:06python-devsetnosy: +python-dev
messages: +msg153035
2012-02-10 08:10:04ezio.melottisetmessages: +msg153032
2012-02-08 15:30:34eric.araujosetmessages: +msg152869
2012-02-08 14:28:56eric.araujosetmessages: +msg152861
2012-02-08 12:19:53ezio.melottisetassignee:ezio.melotti
2012-02-07 11:56:35ezio.melotticreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp