Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-102555: Fix comment parsing in HTMLParser#135664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged

Conversation

@serhiy-storchaka
Copy link
Member

@serhiy-storchakaserhiy-storchaka commentedJun 18, 2025
edited by bedevere-appbot
Loading

  • "--!>" now ends the comment.
  • "-- >" no longer ends the comment.
  • Support abnormally ended empty comments "<-->" and "<--->".

* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".
@serhiy-storchakaserhiy-storchaka changed the titlegh-135661: Fix comment parsing in HTMLParsergh-102555: Fix comment parsing in HTMLParserJun 25, 2025
Copy link
Contributor

@Privat33r-devPrivat33r-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Solid implementation overall :)


# Internal -- parse comment, return length or -1 if not terminated
# see https://html.spec.whatwg.org/multipage/parsing.html#comment-start-state
defparse_comment(self,i,report=True):
Copy link
Contributor

@Privat33r-devPrivat33r-devJun 25, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I wonder if the change should be made in the_markupbase.

defparse_comment(self,i,report=1):
rawdata=self.rawdata
ifrawdata[i:i+4]!='<!--':
raiseAssertionError('unexpected call to parse_comment()')
match=_commentclose.search(rawdata,i+4)
ifnotmatch:
return-1
ifreport:
j=match.start(0)
self.handle_comment(rawdata[i+4:j])
returnmatch.end(0)

If the method is overloaded here, then there are no other use cases, and the original method becomes dead code.
https://github.com/search?q=repo%3Apython%2Fcpython%20parse_comment&type=code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it's fine to override the method here. Since_markupbase has been made internal/private in Python 3 and it's only used byhtml.parser, it makes sense to me to add new code directly tohtml.parser (and possibly even merging_markupbase intohtml.parser eventually).

Regarding the (now) dead code, we could either let it be, adding a comment noting that the method is unused/overridden, or delete it. The first two options are less destructive, but since the module is private there shouldn't be much concern about breaking backward compatibility (and if anyone is relying on the original implementation, they are probably using it throughHTMLParser anyway).

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

_markupbase can be used in third-party code (if it is also the base for the SGML parser or other parsers), so it is better to not touch it in maintained versions. This change can break it if SGML has other rules for comments. But in the developing branch we can remove it, after finishing all other bug fixes.

ezio-melotti reacted with thumbs up emoji

# Internal -- parse comment, return length or -1 if not terminated
# see https://html.spec.whatwg.org/multipage/parsing.html#comment-start-state
defparse_comment(self,i,report=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it's fine to override the method here. Since_markupbase has been made internal/private in Python 3 and it's only used byhtml.parser, it makes sense to me to add new code directly tohtml.parser (and possibly even merging_markupbase intohtml.parser eventually).

Regarding the (now) dead code, we could either let it be, adding a comment noting that the method is unused/overridden, or delete it. The first two options are less destructive, but since the module is private there shouldn't be much concern about breaking backward compatibility (and if anyone is relying on the original implementation, they are probably using it throughHTMLParser anyway).

@serhiy-storchakaserhiy-storchakaenabled auto-merge (squash)July 4, 2025 06:08
@serhiy-storchakaserhiy-storchaka merged commit8ac7613 intopython:mainJul 4, 2025
81 of 83 checks passed
@miss-islington-app
Copy link

Thanks@serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.9, 3.10, 3.11, 3.12, 3.13, 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull requestJul 4, 2025
…TML5 standard (pythonGH-135664)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------(cherry picked from commit8ac7613)Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
@bedevere-app
Copy link

GH-136271 is a backport of this pull request to the3.14 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.14bugs and security fixes labelJul 4, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this pull requestJul 4, 2025
…TML5 standard (pythonGH-135664)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------(cherry picked from commit8ac7613)Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
@bedevere-app
Copy link

GH-136272 is a backport of this pull request to the3.13 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.13bugs and security fixes labelJul 4, 2025
@bedevere-app
Copy link

GH-136273 is a backport of this pull request to the3.12 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.12only security fixes labelJul 4, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this pull requestJul 4, 2025
…TML5 standard (pythonGH-135664)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------(cherry picked from commit8ac7613)Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
@bedevere-app
Copy link

GH-136274 is a backport of this pull request to the3.11 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.11only security fixes labelJul 4, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this pull requestJul 4, 2025
…TML5 standard (pythonGH-135664)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------(cherry picked from commit8ac7613)Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
@bedevere-app
Copy link

GH-136275 is a backport of this pull request to the3.10 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull requestJul 4, 2025
…TML5 standard (pythonGH-135664)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------(cherry picked from commit8ac7613)Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
@bedevere-appbedevere-appbot removed the needs backport to 3.10only security fixes labelJul 4, 2025
@bedevere-app
Copy link

GH-136276 is a backport of this pull request to the3.9 branch.

serhiy-storchaka added a commit that referenced this pull requestJul 4, 2025
…HTML5 standard (GH-135664) (GH-136272)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------(cherry picked from commit8ac7613)Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
serhiy-storchaka added a commit that referenced this pull requestJul 4, 2025
…HTML5 standard (GH-135664) (GH-136271)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------(cherry picked from commit8ac7613)Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
AndPuQing pushed a commit to AndPuQing/cpython that referenced this pull requestJul 11, 2025
…TML5 standard (pythonGH-135664)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
ambv pushed a commit that referenced this pull requestJul 12, 2025
…TML5 standard (GH-135664) (GH-136276)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------(cherry picked from commit8ac7613)Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
ambv pushed a commit that referenced this pull requestJul 12, 2025
…HTML5 standard (GH-135664) (GH-136275)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------(cherry picked from commit8ac7613)Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
ambv pushed a commit that referenced this pull requestJul 12, 2025
…HTML5 standard (GH-135664) (GH-136274)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------(cherry picked from commit8ac7613)Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
ambv pushed a commit that referenced this pull requestJul 12, 2025
…HTML5 standard (GH-135664) (GH-136273)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------(cherry picked from commit8ac7613)Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Pranjal095 pushed a commit to Pranjal095/cpython that referenced this pull requestJul 12, 2025
…TML5 standard (pythonGH-135664)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
picnixz pushed a commit to picnixz/cpython that referenced this pull requestJul 13, 2025
…TML5 standard (pythonGH-135664)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
gentoo-bot pushed a commit to gentoo/cpython that referenced this pull requestJul 30, 2025
… the HTML5 standard (pythonGH-135664) (pythonGH-136276)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------(cherry picked from commit8ac7613)Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>Signed-off-by: Michał Górny <mgorny@gentoo.org>
taegyunkim pushed a commit to taegyunkim/cpython that referenced this pull requestAug 4, 2025
…TML5 standard (pythonGH-135664)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
@serhiy-storchakaserhiy-storchaka deleted the htmlparser-comment branchAugust 14, 2025 12:21
Agent-Hellboy pushed a commit to Agent-Hellboy/cpython that referenced this pull requestAug 19, 2025
…TML5 standard (pythonGH-135664)* "--!>" now ends the comment.* "-- >" no longer ends the comment.* Support abnormally ended empty comments "<-->" and "<--->".---------Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@ezio-melottiezio-melottiezio-melotti approved these changes

+1 more reviewer

@Privat33r-devPrivat33r-devPrivat33r-dev left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

type-securityA security issue

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@serhiy-storchaka@ezio-melotti@Privat33r-dev

[8]ページ先頭

©2009-2025 Movatter.jp