Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-126700: pygettext: Support more gettext functions#126912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
serhiy-storchaka merged 14 commits intopython:mainfromtomasr8:ngettext
Nov 22, 2024

Conversation

@tomasr8
Copy link
Member

@tomasr8tomasr8 commentedNov 16, 2024
edited by bedevere-appbot
Loading

pygettext currently only supports the single-argument form_('foo'). This can be extended via the--keyword CLI argument but all functions are assumed to be single-argument as well. That means nongettext,pgettext, etc..pygettext is currently not able to emitmsgctxt andmsgid_plural at all.

This PR fixes that by adding support for all standard gettext functions:

gettextngettextpgettextnpgettextdgettextdngettextdpgettextdnpgettext

These can be extended using--keyword=KEYWORD but all such keywords are still assumed to be single-argument. Adding CLI support requires parsing thexgettext keyword specifier format (e.g.pgettext:1c,2) which can be done in a followup PR.

Adding support for other gettext functions means that pygettext now extracts calls that may seem a bit questionable such as_(x="kwargs") but it is consistent with other extractors likexgettext andpybabel. This is also the price for using a token-based extractor. If we want to keep the code reasonably simple, we'll need to accept some false positives. I would eventually like to see pygettext switch to an AST-based extractor which would eliminate these issues.

I tried to keep the diff minimal, but some larger changes were needed. Feedback welcome!

DEFAULTKEYWORDS=', '.join(default_keywords)

EMPTYSTRING=''
__version__='1.6'
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I bumped the version since this adds some new capabilities, but let me know if it's not needed!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It makes sense if the script is separately distributed. But when it is the part of the Python distribution, I think that we should use the Python version. We can discuss this in a separate issue.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Good point, I reverted that change. My original reasoning was that since the version is written to the POT file we might want to bump it up but I agree that it should use the Python version itself, not a separate version.

f"{_('foo', 'bar')}"
'''))
self.assertNotIn('foo',msgids)
self.assertIn('foo',msgids)
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Bothxgettext andpybabel extract this and it would take a lot of effort to disallow this in general with the current extractor, so I'd leave this for now at least.

serhiy-storchaka reacted with thumbs up emoji

# calculate all keywords
options.keywords.extend(default_keywords)
options.keywords= {kw: {0:'msgid'}forkwinoptions.keywords}
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

--keyword still works but the keywords are assumed to be single-argument. A followup PR could add support for more.

Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Great! I just took a look and will do a more detailed review tomorrow. So far I see two problems:

  1. It does not work correctly when the first argument щаdgettext is a complex expression that includes parentheses or commas. Nested parentheses should be counted as in__suiteseen.
  2. No warning is issued when the expected argument is not a string literal. Such bug were recently fixed inargparse. The new i18nargparse tests would catch such bugs, becausepygettext emits warnings, but with this PR it silently ignores them.

@tomasr8
Copy link
MemberAuthor

It does not work correctly when the first argument ща dgettext is a complex expression that includes parentheses or commas. Nested parentheses should be counted as in __suiteseen.

That should be fixed now! I used the same mechanism that__suiteseen does.

No warning is issued when the expected argument is not a string literal. Such bug were recently fixed in argparse. The new i18n argparse tests would catch such bugs, because pygettext emits warnings, but with this PR it silently ignores them.

I restored the warnings. I initially removed them in order to allow extraction of keyword arguments (e.g._(x="foo")) which is supported byxgettext andpybabel. Now that the warnings are restored, this is not allowed anymore since properly parsing those would add a lot of complexity (it wasn't allowed before either so the behaviour ofpygettext does not change).

Note that there are still a lot of edge cases when it comes to extraction which I didn't want to address in this PR. For instance, extraction of nested constructs such as_('foo', param=_('bar')) and f-strings. Addressing those would be considerably easier if we eventually switched to a parser-based approach as in#104402. If you think it's worthwhile I'd be happy to continue work on that PR once this lands 🙂

argument '%(argument_name)s' is deprecated
can't open '%(filename)s': %(error)s
command '%(parser_name)s' is deprecated
conflicting option string: %s

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Shouldmsgid_plural also be output? Or do this in the following PR?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We might also want to includemsgctxt even though I don't think any messages usepgettext currently. I was thinking instead of using a list of msgids, why not use the generated POT file?

We initially rejected that idea when adding the snapshots because of potentially changing line locations, but pygettext has an option to turn those off.

We could even add an option to not emit the header (pybabel has this for instance). Then the snapshots would only change if the strings themselves change.

DEFAULTKEYWORDS=', '.join(default_keywords)

EMPTYSTRING=''
__version__='1.6'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It makes sense if the script is separately distributed. But when it is the part of the Python distribution, I think that we should use the Python version. We can discuss this in a separate issue.

self.__messages= {}
self.__state=self.__waiting
self.__data= []
self.__data=defaultdict(str)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why use defaultdict?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

eliftstringin')]}':
self.__enclosurecount-=1
elifexpect_string_literal:
# We are inside an argument which is a translatable string and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think this can be merged with the below. But I can be wrong.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hmm I don't think it can, unless I am misunderstading what you mean

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It is not so important now, after you moved the ugly part of the code intowarn_unexpected_token(), but you can avoid the code duplication by using earlierreturns.

ifttype==tokenize.OPandself.__enclosurecount==0:iftstring==')':        ...returniftstring==',':        ...returnifexpect_string_literal:    ...# handle string literals, comments, etcelse:    ...# handle parentheses

tomasr8 reacted with thumbs up emoji
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice! I wasn't considering early returns but I think it's more digestible this way. I updated the code and added more test cases :)

Comment on lines 501 to 504
eliftstringin'([{':
self.__enclosurecount+=1
eliftstringin')]}':
self.__enclosurecount-=1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These are invalid ifexpect_string_literal is true.

I suspect that this does not work correctly for_('string'[i]).

See the comment below.

eliftstringin')]}':
self.__enclosurecount-=1
elifexpect_string_literal:
# We are inside an argument which is a translatable string and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It is not so important now, after you moved the ugly part of the code intowarn_unexpected_token(), but you can avoid the code duplication by using earlierreturns.

ifttype==tokenize.OPandself.__enclosurecount==0:iftstring==')':        ...returniftstring==',':        ...returnifexpect_string_literal:    ...# handle string literals, comments, etcelse:    ...# handle parentheses

tomasr8 reacted with thumbs up emoji
Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM. 👍

@serhiy-storchakaserhiy-storchaka merged commit0a1944c intopython:mainNov 22, 2024
40 checks passed
@tomasr8tomasr8 deleted the ngettext branchNovember 22, 2024 19:01
ebonnal pushed a commit to ebonnal/cpython that referenced this pull requestJan 12, 2025
…126912)Support multi-argument gettext functions: ngettext(), pgettext(), dgettext(), etc.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@serhiy-storchakaserhiy-storchakaserhiy-storchaka approved these changes

@savannahostrowskisavannahostrowskiAwaiting requested review from savannahostrowski

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

@tomasr8@serhiy-storchaka

[8]ページ先頭

©2009-2025 Movatter.jp