Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
gh-130197: Test various encodings with pygettext#132244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I do not think that duplicating this test with multiple encodings is needed. It is enough to test with one encoding -- and it should not be Latin1 or Windows-1252, which are often the default encoding. The CPU time can be spent on different tests.
Please add also non-ASCII comments.
Finally, we need to add tests for non-ASCII filenames on non-UTF-8 locale. I afraid thati18n_data
cannot be used for this -- we need to try several locales with different encodings and generate an input file with corresponding name.
We need also to test the stderr output for files with non-ASCII file name and non-ASCII source encoding on non-UTF-8 locale. It contains a file name and may contain a fragment of the source text.
Uh oh!
There was an error while loading.Please reload this page.
For context:#131902 (comment)
We currently set the charset of the POT file as the default encoding on the system (
fp.encoding
):cpython/Tools/i18n/pygettext.py
Lines 574 to 576 inf5639d8
To have reproducible tests regardless of the OS they are running on, we set
-X utf8
in the tests. As a consequence, the POT charset is always set toutf-8
. I don't think there's an easy way to control that if we want to test other output encodings.. At least with these tests we know that non-utf8 input files can be read correctly.cc@serhiy-storchaka Let me know if this is what you had in mind for the tests!