[...] We need this to test with and without --escape option. In particularly, to see what encoding is used in the POT file and in the header. In file with non-UTF-8 encoding use also characters not encodable with the source encoding (\uXXXX).

We currently set the charset of the POT file as the default encoding on the system (fp.encoding):

cpython/Tools/i18n/pygettext.py

Lines 574 to 576 inf5639d8

	defwrite_pot_file(messages,options,fp):
	timestamp=time.strftime('%Y-%m-%d %H:%M%z')
	encoding=fp.encodingiffp.encodingelse'UTF-8'

To have reproducible tests regardless of the OS they are running on, we set-X utf8 in the tests. As a consequence, the POT charset is always set toutf-8. I don't think there's an easy way to control that if we want to test other output encodings.. At least with these tests we know that non-utf8 input files can be read correctly.

cc@serhiy-storchaka Let me know if this is what you had in mind for the tests!

Issue:pygettext: Improve test coverage #130197

Test various encodings with pygettext

e28559c

tomasr8 added the skip news label

Apr 7, 2025

bedevere-appbot added the awaiting review label

Apr 7, 2025

bedevere-appbot mentioned this pull request

Apr 7, 2025

pygettext: Improve test coverage#130197

Open

18 tasks

Update test subdirs

e12bb90

tomasr8 requested a review fromerlend-aasland as acode owner

April 7, 2025 22:01

serhiy-storchaka reviewed

Apr 8, 2025

View reviewed changes

Copy link

Member

serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I do not think that duplicating this test with multiple encodings is needed. It is enough to test with one encoding -- and it should not be Latin1 or Windows-1252, which are often the default encoding. The CPU time can be spent on different tests.

Please add also non-ASCII comments.

Finally, we need to add tests for non-ASCII filenames on non-UTF-8 locale. I afraid thati18n_data cannot be used for this -- we need to try several locales with different encodings and generate an input file with corresponding name.

We need also to test the stderr output for files with non-ASCII file name and non-ASCII source encoding on non-UTF-8 locale. It contains a file name and may contain a fragment of the source text.