Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-79516: allow msgfmt.py to compile multiple input po files#10875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
s-ball wants to merge68 commits intopython:main
base:main
Choose a base branch
Loading
froms-ball:multi_inputs

Conversation

s-ball
Copy link

@s-balls-ball commentedDec 3, 2018
edited by bedevere-appbot
Loading

msgfmt.py (from Tools/i18n) can now reliably be passed more than one input po file.

In addition, itsmake central function can reliably be called repeatedly, which fixesbpo-9741

https://bugs.python.org/issue35335

Test option processing, and conversion of one single file with orwithout the -o option.Also test the little documented behaviour of merging two input fileswith -o option.
Also show that it is now possible to build multiple po files in onesingle script call.
@the-knights-who-say-ni

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed thePSF contributor agreement (CLA).

Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please followthe steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You cancheck yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

s-balland others added3 commitsJanuary 23, 2025 19:32
When merging main into multi_inputs, the reference to os_helper waserroneously removed.
In 2018, all imports came from the test.support package. They are nowsplitted among various subpackages.
@s-ball
Copy link
Author

My patches now successfully pass all tests. Is there anything else I should do?

@zwarezware changed the titlebpo-35335: explicitely allows msgfmt.py to compile more than one input po filesgh-79516: explicitely allows msgfmt.py to compile more than one input po filesFeb 25, 2025
Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The GNUmsgfmt treats duplicated keys as conflicts. Silently ignoring can lead to bugs.

Comment on lines 114 to 115
BEWARE: in previous version of make the first parameter was a string
containing a single filename.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This note will soon became confusing, as it is not clear what version is previous. The right place for it is in the commit message, not in docstring.

# each PO file generates its corresponding MO file
for filename in filenames:
messages = {}
infile, outfile = get_names(filename, None)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Now we can refactorget_names(). Split it on two parts -- for infile and for outfile. The first part can be inlined and moved inprocess(), and the second part can be left here, inmake().

ID = 1
STR = 2
CTXT = 3
def make(filenames, outfile=None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

No default value is needed.

"""Tests for multiple input files
"""

def test_no_outputfile(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This test looks redundant.test_both_without_outputfile supersedes it, and this isMultiInputTest.

@tomasr8
Copy link
Member

The GNUmsgfmt treats duplicated keys as conflicts. Silently ignoring can lead to bugs.

We touched upon this a bit before:#10875 (comment) We could either fix it now or leave it for a followup PR. Which one would you prefer?

@serhiy-storchaka
Copy link
Member

If the solution is non-trivial, it can wait. I'm just worried that it may require rewriting the tests added here.

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
@s-ball
Copy link
Author

If the solution is non-trivial, it can wait. I'm just worried that it may require rewriting the tests added here.

It would be rather simple to test if a new msgid already exists and raise an exception. But GNU msgfmt gives more: the exact file and line of the first occurrence and of the second one. That part will require a non trivial work... For the tests part, the only change would be that the current merge should abort (the"" msgid is present in both files), and that we should provide a copy of the second PO file without its header to obtain the current merge.

Remove one redundant test from test_msgfmt.pyRemove unneeded changes from msgfmt.py

def get_names(filename, outfile):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It is no longer used.

# each PO file generates its corresponding MO file
for filename in filenames:
messages = {}
outfile = os.path.splitext(filename)[0] + '.mo'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There is a change in the behavior iffilename does not end with.po.

For testing, you can use other suffix for one of test PO files.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That is the reason whyoutfile was computed afterinfile. Asinfile is always computed the same way, beoutfile present or not, I externalized it in theget_names method when I write my initial patch in 2018. I am sorry but it was a long time ago and I had completely forgotten about that.

The only way I can imagine without code duplication is thatprocess returns its input filename. But I would not find that very nice on a separation of concern point of view. Do you prefer that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

After a second thought, that latter way is more efficient because the output file name is not computed when it was provided. I'll try to implement it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It is up to you. But computingoutfile when it was not used was not elegant.

You can factor out computation ofinfile (withoutoutfile) into a function and call it in both loops. Or you can pre-process the list of filenames.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The use ofnormcase() complicates the code (it was the right decision). In the end, your code withget_names() may be not the worst option. It only looked not elegant. In any case, please add more tests or modify existing tests to cover such special cases.

I am hesitant about adding a special test for input file with the.PO extension (the result will be different on Windows and non-Windows.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I could try to build a dedicated test decorated with@unittest.skipUnless(sys.platform.startswith("win"), "only for Windows")...

Fixes a small bug introduced in previous commit (changed behaviour when an input file has an extension other than .po)
@s-ball
Copy link
Author

Ok the tests for the infile/outfile computations are passing.

Back to the duplicate ids question, it should not be that hard:

  • add alid variable inprocess to record thelno value when a msgid is found
  • pass thatlid along withinfile to theadd method and store tuples(msgstr, infile, lid) in themessages dict
  • inadd, test whether a key is present before adding a new record and abort with a message - we now have the file and line number of the previous and current id
  • ingenerate replace the 2 occurrences ofmessages[id] withmessages[id][0]

For the tests, the currenttest_both_with_outputfile should fail because the empty msgid is duplicated in the second file. We should just add a new PO file without header to have a passing test.

This would allow a better GNU msgfmt compatibility.

Do you think that this should go into this PR or into a new one?

@s-ball
Copy link
Author

Started working on that point, and I fell not on technical but behavior questions.msgfmt chokes if more than one file contains a header, but the header is the only place where the encoding of the PO file can be declared. As GNU gettext contains a number of auxiliary programs that can help to manage a set of PO files this behavior is nice, butmsgfmt.py has not them and it would be a serious limitation. My opinion is thatmsgfmt.py should accept a header in every file but only store the first (or the last?) in the resulting MO file. That way users could process PO files with other encodings than UTF-8. But I would love to have other opinions on that...

If you think that this points requires a longer discussion, or that it should be discussed on a different place, then it means that we should handle it in a followup PR. If you think that just making a special case for the header is reasonable, then I could implement it in a couple of days.

s-balland others added3 commitsApril 4, 2025 09:39
compile_messages parameters order has changed in a previous commit to allow compiling multiple PO files.
@StanFromIreland
Copy link
Contributor

There will be a lot of conflicts with my hashing pr… this pr changes quite a lot. What do we want to do?

@s-ball
Copy link
Author

I was not very happy to reverse to parameters order ofcompile_messages because I thought it could disturb users if they were used to the precedent order. If there are concurrent PR (which I was absolutely not aware), this is clearly a blocking point. IMHO the only possible ways are to sequence the PR if there are only two of them, or to revert the modification ofcompile_messages, and use a different function to compile many PO files. I am sorry if I have made a mistake here. It is my first PR and I really do not know the usages. Anyway are there other sources of conflict?

@StanFromIreland
Copy link
Contributor

I guess it is up to@serhiy-storchaka who is the expert for msgfmt if I remember correctly as to what order we want these. (Who will have to deal with the conflicts)

@serhiy-storchaka
Copy link
Member

Before proceeding with this issue, we must solve other issues:--exclude-file,--omit-header, use the source file encoding. I think they are important for this issue.

@s-ball
Copy link
Author

s-ball commentedApr 4, 2025
edited
Loading

If I correctly understand, I shall just leave this PR as it is now, without even worrying for conflicts, and only come back when the other issues will be fixed. Not really a problem, but do not forget to wake me when it will be the time to fix the conflicts... Anyway it is far from a very important one even for me as I have understood that I had to use a private copy for my own project. I have already been working that way for more that 6 years after all... Stupid me again! If there are conflicts with other changes, it means that those changes have been merged and that the conflicts are to be resolved... I just shall keep on coming here from time to time and fix them when I see them. But unless being advised to act differently I shall not try to handle the duplicate ids question before the PR is merged in its current state to avoid adding other sources of possible conflicts.

@pythonpython deleted a commentApr 7, 2025
@python-cla-bot
Copy link

python-cla-botbot commentedApr 18, 2025
edited
Loading

All commit authors signed the Contributor License Agreement.

CLA signed

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@merwokmerwokmerwok approved these changes

@serhiy-storchakaserhiy-storchakaserhiy-storchaka left review comments

@tomasr8tomasr8tomasr8 left review comments

@StanFromIrelandStanFromIrelandStanFromIreland left review comments

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

8 participants
@s-ball@the-knights-who-say-ni@tomasr8@merwok@StanFromIreland@serhiy-storchaka@ezio-melotti@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp