Movatterモバイル変換

pygettext: use an AST parser instead of a tokenizer#104400

Closed

bedevere-bot added the awaiting review label

tomasr8 mentioned this pull request

gh-104400: pygettext: use an AST parser instead of a tokenizer#104402

Merged

Copy link

Member

AA-Turner commentedAug 20, 2023

@tomasr8 Thanks for opening the new PR! Could you look at the failing tests please?

======================================================================FAIL: test_pygettext_output (test.test_tools.test_i18n.test_i18n.Test_pygettext.test_pygettext_output) [Input file: data/messages.py]Test that the pygettext output exactly matches a file.----------------------------------------------------------------------Traceback (most recent call last):  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 350, in test_pygettext_output    self.assert_POT_equal(expected, output)  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 75, in assert_POT_equal    self.assertEqual(expected, actual)AssertionError: '# SO[397 chars]rset=UTF-8\\n"\n"Content-Transfer-Encoding: 8b[682 chars]\n\n' != '# SO[397 chars]rset=cp1252\\n"\n"Content-Transfer-Encoding: 8[683 chars]\n\n'Diff is 1325 characters long. Set self.maxDiff to None to see it.======================================================================FAIL: test_pygettext_output (test.test_tools.test_i18n.test_i18n.Test_pygettext.test_pygettext_output) [Input file: data/docstrings.py]Test that the pygettext output exactly matches a file.----------------------------------------------------------------------Traceback (most recent call last):  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 350, in test_pygettext_output    self.assert_POT_equal(expected, output)  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 75, in assert_POT_equal    self.assertEqual(expected, actual)AssertionError: '# SO[397 chars]rset=UTF-8\\n"\n"Content-Transfer-Encoding: 8b[340 chars]\n\n' != '# SO[397 chars]rset=cp1252\\n"\n"Content-Transfer-Encoding: 8[341 chars]\n\n'Diff is 956 characters long. Set self.maxDiff to None to see it.======================================================================FAIL: test_pygettext_output (test.test_tools.test_i18n.test_i18n.Test_pygettext.test_pygettext_output) [Input file: data/fileloc.py]Test that the pygettext output exactly matches a file.----------------------------------------------------------------------Traceback (most recent call last):  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 350, in test_pygettext_output    self.assert_POT_equal(expected, output)  File "D:\a\cpython\cpython\Lib\test\test_tools\test_i18n\test_i18n.py", line 75, in assert_POT_equal    self.assertEqual(expected, actual)AssertionError: '# SO[397 chars]rset=UTF-8\\n"\n"Content-Transfer-Encoding: 8b[294 chars]\n\n' != '# SO[397 chars]rset=cp1252\\n"\n"Content-Transfer-Encoding: 8[295 chars]\n\n'Diff is 907 characters long. Set self.maxDiff to None to see it.

tomasr8 added2 commits

August 20, 2023 19:28

Specify file encoding

b1b0892

Normalize charset

eb7f488

Copy link

MemberAuthor

tomasr8 commentedAug 20, 2023

hmm looks like an encoding issue. Hopefully, this'll fix it.

Copy link

Member

AA-Turner commentedAug 20, 2023

Looks like the same three tests failed. Do you have access to a Windows computer? If not I should be able to have a look later on.

AA-Turner added the skip news label

Copy link

MemberAuthor

tomasr8 commentedAug 20, 2023•
edited
Loading

No worries! luckily I have a windows machine lying around 😅 The problem is that there's no way to specify the output encoding so it uses the platform default. This makes it difficult to compare the files because the charset is also part of the header.. I'll just normalize it the same way as I do with the creation date.

AA-Turner reviewed

Aug 21, 2023

Copy link

Member

AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks! Some comments on the test code:

Lib/test/test_tools/test_i18n/test_i18n.py OutdatedShow resolvedHide resolved

tomasr8and others added2 commits

August 21, 2023 20:20

Apply suggestions from code review

7428393

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>

Apply suggestions from code review

f06cbb5

AA-Turner added the testsTests in the Lib/test dir label

Aug 28, 2023

AA-Turner approved these changes

Aug 28, 2023

gettext: remove unecessary test cases testing single/double quotes#107510

Copy link

Member

AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks!

bedevere-bot added awaiting core review and removed awaiting review labels

Aug 28, 2023

Copy link

MemberAuthor

tomasr8 commentedAug 29, 2023

Thanks!

Thanks for the review!

erlend-aasland requested a review fromwarsaw

August 30, 2023 07:42

Copy link

Contributor

erlend-aasland commentedAug 30, 2023

cc.@warsaw who asked for a ping on Discourse :)

tomasr8 mentioned this pull request

Oct 9, 2023

Closed

Merge branch 'main' into pygettext-tests

f4b7955

serhiy-storchaka self-requested a review

October 9, 2023 16:25

erlend-aaslandand others added5 commits

December 4, 2023 11:53

Merge branch 'main' into pygettext-tests

c6cb8b9

Merge branch 'main' into pygettext-tests

6a76d97

Merge branch 'main' into pygettext-tests

ebcc6ea

Merge branch 'main' into pygettext-tests

9dbc1c7

Merge branch 'main' into pygettext-tests

1c3d46a

serhiy-storchaka reviewed

Oct 28, 2024

Lib/test/test_tools/test_i18n/test_i18n.py Outdated

		"""Tests to cover the Tools/i18n package"""

		importos
		frompathlibimportPath

Copy link

Member

serhiy-storchakaOct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is it necessary to usepathlib? Other tests simply useos.path.

Copy link

MemberAuthor

tomasr8Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There were only a handful of uses ofos so I went ahead and replaced them withpathlib which I think improves readability, but I'm happy to revert the change if you prefer to keepos :)

Lib/test/test_tools/test_i18n/test_i18n.py OutdatedShow resolvedHide resolved

Lib/test/test_tools/test_i18n/__init__.py OutdatedShow resolvedHide resolved

tomasr8 added5 commits

October 28, 2024 21:23

Add a CLI command to regenerate snapshots

5fba1bb

Regenerate snapshots

9f388af

Simplify code

88f6350

Set maxDiff to None

f4ed4e4

Add test dir to Makefile

63eef00

tomasr8 requested a review fromerlend-aasland as acode owner

October 28, 2024 20:31

Copy link

MemberAuthor

tomasr8 commentedOct 28, 2024•
edited
Loading

I also added a--snapshot-update CLI argument to make it easy to regenerate the snapshots (as is already the case with some ast and recently argparse tests)

tomasr8 requested a review fromserhiy-storchaka

November 3, 2024 12:14

serhiy-storchaka reviewed

Lib/test/test_tools/test_i18n.py Outdated

		defupdate_POT_snapshots():
		forinput_fileinDATA_DIR.glob('*.py'):
		output_file=input_file.with_suffix('.pot')
		contents=input_file.read_text(encoding='utf-8')

Copy link

Member

serhiy-storchakaNov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It would be nice to have some files with non-UTF-8 encoding.

Sincecontents is only used to copy a file, you can read/write the binary content.

Lib/test/test_tools/test_i18n.py Outdated

		withtemp_cwd(None):
		Path(input_file.name).write_text(contents)
		assert_python_ok(Test_pygettext.script,'--docstrings',input_file.name)
		output=Path('messages.pot').read_text()

Copy link

Member

serhiy-storchakaNov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

When you read text, always specify the encoding.

Copy link

MemberAuthor

tomasr8Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This causesproblems on Windows, where the encoding iscp1252 so reading it back asutf8 fails. I don't know how else to get around this besides forcing pygettext to always output utf8 (or adding a configurable parameter). Do you have any suggestions?

Copy link

Member

serhiy-storchakaNov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We should use-Xutf8 orPYTHONIOENCODING=utf-8 to run pygettext, because the text can be non-encodable with the locale encoding.

Use '-Xutf8'

c26d488

serhiy-storchaka approved these changes

Copy link

Member

serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM. 👍

bedevere-appbot added awaiting merge and removed awaiting core review labels

serhiy-storchakaenabled auto-merge (squash)

November 3, 2024 13:48

serhiy-storchaka added needs backport to 3.12

only security fixes

needs backport to 3.13bugs and security fixes labels

serhiy-storchaka merged commitdcae5cd intopython:main

42 checks passed

Copy link

miss-islington-appbot commentedNov 3, 2024

Thanks@tomasr8 for the PR, and@serhiy-storchaka for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

bedevere-appbot removed the awaiting merge label

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request

pythongh-104400: Add more tests to pygettext (pythonGH-108173)

3422519

(cherry picked from commitdcae5cd)Co-authored-by: Tomas R. <tomas.roun8@gmail.com>

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request

pythongh-104400: Add more tests to pygettext (pythonGH-108173)

26bb10d

(cherry picked from commitdcae5cd)Co-authored-by: Tomas R. <tomas.roun8@gmail.com>

Copy link

bedevere-appbot commentedNov 3, 2024

GH-126361 is a backport of this pull request to the3.13 branch.

bedevere-appbot removed the needs backport to 3.13bugs and security fixes label

Copy link

bedevere-appbot commentedNov 3, 2024

GH-126362 is a backport of this pull request to the3.12 branch.

bedevere-appbot removed the needs backport to 3.12only security fixes label

tomasr8 deleted the pygettext-tests branch

November 3, 2024 14:01

serhiy-storchaka pushed a commit that referenced this pull request

[3.12]gh-104400: Add more tests to pygettext (GH-108173) (GH-126362)

b0e08f5

(cherry picked from commitdcae5cd)Co-authored-by: Tomas R <tomas.roun8@gmail.com>

serhiy-storchaka pushed a commit that referenced this pull request