Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-130197: pygettext: Fix and test the--exclude-file option#131381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
tomasr8 wants to merge6 commits intopython:main
base:main
Choose a base branch
Loading
fromtomasr8:pygettext-cli-exclude

Conversation

tomasr8
Copy link
Member

@tomasr8tomasr8 commentedMar 17, 2025
edited by bedevere-appbot
Loading

Fixes and adds tests for the--exclude-file option. This is what the option does:

--exclude-file=filename
Specify a file that contains a list of strings that are not be
extracted from the input files. Each string to be excluded must
appear on a line by itself in the file.

Previously, the file was read asf.readlines() which keeps the\n character for each excluded string. This made the option not work properly. I changed this tof.read.splitlines() which strips the newlines so the option should now work as expected.

@serhiy-storchaka
Copy link
Member

So,--exclude-file does not work and never worked as intended.

Note also that if it worked as intended, it would work differently than inxgettext. So why not implement thexgettext behavior?

@tomasr8
Copy link
MemberAuthor

So, --exclude-file does not work and never worked as intended.

Pretty much, yes. I randomly discovered this when writing the tests for it..

Note also that if it worked as intended, it would work differently than in xgettext. So why not implement the xgettext behavior?

If you're ok with that, I can implement it. The xgettext--exclude-file option reads the msgids from a PO file. Pygettext currently only knows how to write a PO file but not how to read it. msgfmt can read PO files so we could reuse the code from there. Perhaps create some helper file for PO reading/writing?

@serhiy-storchaka
Copy link
Member

For now, I think that we can just copy the code frommsgfmt.py. In future we may add the code for reading the PO files in thegettext module.

tomasr8 reacted with thumbs up emoji

@tomasr8
Copy link
MemberAuthor

Ok, I'll just copy it over then for now :)

@tomasr8
Copy link
MemberAuthor

In future we may add the code for reading the PO files in the gettext module.

That would actually be really helpful for my work as well. I'd be interested in contributing this. Have you already thought about how you'd like it to work? Should it be something compatible withGNUTranslations or do we want a richer interface that allows you to access e.g. the comments and flags as well?

@tomasr8
Copy link
MemberAuthor

For now, I think that we can just copy the code from msgfmt.py

Just to give an update, it probably won't be as simple. I started off by copying the code with minimal changes (just to get it to work with pygettext) and I discovered several bugs in the parsing logic when I started writing tests. I'll take some inspiration from the original code but I will need to make quite a few changes in order to make PO parsing work correctly.

Allows passing a regular PO file as an exclude file.All msgids in this file will be excluded when extracting.This adds PO parsing capability to pygettext.
Comment on lines +503 to +505
# - \n, \r, \t, \\, \", \a, \b, \f, \v
# - Octal escapes: \o, \oo, \ooo
# - Hex escapes: \xh, \xhh, ...
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is not standardized but I checked this with GNUmsgfmt and these are the ones that are allowed.

Comment on lines +304 to +306
f"The file {filename} starts with a UTF-8 BOM which is not "
"allowed in .po files.\nPlease save the file without a BOM "
"and try again.")
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is the same error we added tomsgfmt.py recently.

@tomasr8
Copy link
MemberAuthor

I implement PO parsing in order to make--exclude-file work as it does inxgettext. I started with the code inmsgfmt.py but as I wrote before, there are some issues with it (e.g. bugs, differences from GNUmsgfmt) which I fixed and added lots of tests (actually most of this PR are tests now, so hopefully it won't be that daunting to review 😄 )

So to summarize:

  • pygettext is now able to parse PO files. I also verified that the behaviour matches GNUmsgfmt (there's lots of edge cases, so I may have missed some though)
  • --exclude-file now accepts a PO file. All msgids from this file are ignored when extracting.

@serhiy-storchaka Do you think you could have a look at this when you have time? 🙂

Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thank you for your work.

I am sorry, but it seems I underestimated the size and complexity of the parsing code. Worst of all, this is code that can likely be changed in future (to fix minor errors or adding features). This turned out to be not such a good idea.

The code needs to be shared in some way. Maybe just import a function frommsgfmt?

Wait, how is it happened that the wholemsgfmt.py is smaller than the parsing code in this PR?

@@ -742,14 +1006,18 @@ class Options:
# initialize list of strings to exclude
if options.excludefilename:
try:
with open(options.excludefilename) as fp:
options.toexclude = fp.readlines()
options.toexclude = get_msgids_from_exclude_file(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It seems that inxgettext you can specify--exclude-file multiple times, and all msgids are added.

tomasr8 reacted with thumbs up emoji
@serhiy-storchaka
Copy link
Member

I think that it is better to fix the parser inmsgfmt first.

tomasr8 reacted with thumbs up emoji

@tomasr8
Copy link
MemberAuthor

I think that it is better to fix the parser inmsgfmt first.

Originally, I was meaning to port this change to msgfmt after this is merged, but I can do the opposite. I'll send a PR to fix the msgfmt parser hopefully in a few days!

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@serhiy-storchakaserhiy-storchakaserhiy-storchaka left review comments

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@tomasr8@serhiy-storchaka

[8]ページ先頭

©2009-2025 Movatter.jp