Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-118350: Add escapable-raw-text mode to html parser#121770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
timonviola wants to merge10 commits intopython:main
base:main
Choose a base branch
Loading
fromtimonviola:fix-issue-118350

Conversation

timonviola
Copy link
Contributor

@timonviolatimonviola commentedJul 14, 2024
edited by bedevere-appbot
Loading

escapable raw text elements are not handled in the current HTMLParser implementation.

This PR extends the existing parser with an additional mode to handle this correctly.

Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What is the difference between processing raw text elements and escapable raw text elements? I do not see any this code.

("data", content),
("endtag", element_lower)])

def test_escapable_raw_text_with_closing_tags(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is it right? The test name is test_escapable_raw_text_with_closing_tags, but it tests the script element. It looks very similar to test_cdata_with_closing_tags.

timonviola reacted with eyes emoji
'<!-- not a comment --> &not-an-entity-ref;',
"<not a='start tag'>",
'<a href="" /> <p> <span></span>',
'foo = "</scr" + "ipt>";',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why test this in the title and textarea elements?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Add also examples of valid character references and an ambiguous ampersand.

timonviola reacted with thumbs up emoji
@timonviola
Copy link
ContributorAuthor

@ezio-melotti@serhiy-storchaka can you help with the review?

Comment on lines 411 to 414
('starttag', 'title', []), ('data', text),
('endtag', 'title'), ('data', '"'),
('starttag', 'textarea', []), ('data', text),
('endtag', 'textarea'), ('data', '"')]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is not correct. Charrefs should be resolved in escapable raw text elements. Data should be'"X"X"' instead oftext. Except for an ambiguous ampersand.

@@ -317,6 +319,34 @@ def get_events(self):
("endtag", element_lower)],
collector=Collector(convert_charrefs=False))

def test_escapable_raw_text_content(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

How does this test differ from test_cdata_content? BTW, most examples use JavaScript syntax, and only relevant for<script>.

@@ -28,6 +28,7 @@

starttagopen = re.compile('<[a-zA-Z]')
piclose = re.compile('>')
escapable_raw_text_close = re.compile('</(title|textarea)>', re.I)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is it even used?

if self.cdata_elem:
break
j = n
if i < j:
if self.convert_charrefs and not self.cdata_elem:
if self.convert_charrefs and not self.cdata_elem and not self.escapable_raw_text_elem:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is incorrect. Charrefs should be resolved in an escapable raw text element. Except an ambiguous ampersand.

We need also tests forconvert_charrefs=False in an escapable raw text element.

@@ -138,6 +141,14 @@ def get_starttag_text(self):
"""Return full source of start tag: '<...>'."""
return self.__starttag_text

def set_escapable_raw_text_mode(self, elem):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Since the behavior for raw text elements and escapable raw text elements is so similar, and they cannot be nested, why not useset_cdata_mode() andcdata_elem for both? Just add an optional boolean parameter to specify whether it is escapable (charrefs should be unescaped) or not.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@serhiy-storchaka I can do that.

('entityref', 'amp'),
('data', ' Pumba')
],
collector=Collector(convert_charrefs=False),
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Did you mean this test?@serhiy-storchaka

serhiy-storchaka reacted with thumbs up emoji

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes. Thanks.

timonviola reacted with thumbs up emoji
@serhiy-storchakaserhiy-storchaka marked this pull request as draftMay 15, 2025 07:11
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@serhiy-storchakaserhiy-storchakaserhiy-storchaka left review comments

@ezio-melottiezio-melottiAwaiting requested review from ezio-melottiezio-melotti is a code owner

Assignees

@ezio-melottiezio-melotti

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@timonviola@serhiy-storchaka@ezio-melotti

[8]ページ先頭

©2009-2025 Movatter.jp