Movatterモバイル変換

bedevere-appbot mentioned this pull request

pygettext: Add support for multi-argument gettext functions#126700

Closed

tomasr8 commented

Lib/test/test_tools/test_i18n.py

Tools/i18n/pygettext.py Outdated

		DEFAULTKEYWORDS=', '.join(default_keywords)

		EMPTYSTRING=''
		__version__='1.6'

Copy link

MemberAuthor

tomasr8Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I bumped the version since this adds some new capabilities, but let me know if it's not needed!

Copy link

Member

serhiy-storchakaNov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It makes sense if the script is separately distributed. But when it is the part of the Python distribution, I think that we should use the Python version. We can discuss this in a separate issue.

Copy link

MemberAuthor

tomasr8Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Good point, I reverted that change. My original reasoning was that since the version is written to the POT file we might want to bump it up but I agree that it should use the Python version itself, not a separate version.

		f"{_('foo', 'bar')}"
		'''))
		self.assertNotIn('foo',msgids)
		self.assertIn('foo',msgids)

Copy link

MemberAuthor

tomasr8Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Bothxgettext andpybabel extract this and it would take a lot of effort to disallow this in general with the current extractor, so I'd leave this for now at least.

Tools/i18n/pygettext.py


		# calculate all keywords
		options.keywords.extend(default_keywords)
		options.keywords= {kw: {0:'msgid'}forkwinoptions.keywords}

Copy link

MemberAuthor

tomasr8Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

--keyword still works but the keywords are assumed to be single-argument. A followup PR could add support for more.

serhiy-storchaka reviewed

Copy link

Member

serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Great! I just took a look and will do a more detailed review tomorrow. So far I see two problems:

It does not work correctly when the first argument щаdgettext is a complex expression that includes parentheses or commas. Nested parentheses should be counted as in__suiteseen.
No warning is issued when the expected argument is not a string literal. Such bug were recently fixed inargparse. The new i18nargparse tests would catch such bugs, becausepygettext emits warnings, but with this PR it silently ignores them.

tomasr8 added3 commits

November 17, 2024 11:10

Correctly count enclosures

62d6455

Restore warnings for invalid arguments

496f5d9

Remove extra space

06186a0

Copy link

MemberAuthor

tomasr8 commentedNov 17, 2024

It does not work correctly when the first argument ща dgettext is a complex expression that includes parentheses or commas. Nested parentheses should be counted as in __suiteseen.

That should be fixed now! I used the same mechanism that__suiteseen does.

No warning is issued when the expected argument is not a string literal. Such bug were recently fixed in argparse. The new i18n argparse tests would catch such bugs, because pygettext emits warnings, but with this PR it silently ignores them.

I restored the warnings. I initially removed them in order to allow extraction of keyword arguments (e.g._(x="foo")) which is supported byxgettext andpybabel. Now that the warnings are restored, this is not allowed anymore since properly parsing those would add a lot of complexity (it wasn't allowed before either so the behaviour ofpygettext does not change).

Note that there are still a lot of edge cases when it comes to extraction which I didn't want to address in this PR. For instance, extraction of nested constructs such as_('foo', param=_('bar')) and f-strings. Addressing those would be considerably easier if we eventually switched to a parser-based approach as in#104402. If you think it's worthwhile I'd be happy to continue work on that PR once this lands 🙂

Simplify code

3d67a7a

tomasr8 commented

Nov 17, 2024

Lib/test/test_tools/i18n_data/messages.py OutdatedShow resolvedHide resolved

Update comment

48070d5

serhiy-storchaka reviewed

Nov 18, 2024

Lib/test/translationdata/argparse/msgids.txt

		argument '%(argument_name)s' is deprecated
		can't open '%(filename)s': %(error)s
		command '%(parser_name)s' is deprecated
		conflicting option string: %s

Copy link

Member

serhiy-storchakaNov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Shouldmsgid_plural also be output? Or do this in the following PR?

Copy link

MemberAuthor

tomasr8Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We might also want to includemsgctxt even though I don't think any messages usepgettext currently. I was thinking instead of using a list of msgids, why not use the generated POT file?

We initially rejected that idea when adding the snapshots because of potentially changing line locations, but pygettext has an option to turn those off.

We could even add an option to not emit the header (pybabel has this for instance). Then the snapshots would only change if the strings themselves change.

Misc/NEWS.d/next/Tools-Demos/2024-11-16-20-47-20.gh-issue-126700.ayrHv4.rst OutdatedShow resolvedHide resolved

Tools/i18n/pygettext.py Outdated

		DEFAULTKEYWORDS=', '.join(default_keywords)

		EMPTYSTRING=''
		__version__='1.6'

Copy link

Member

serhiy-storchakaNov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It makes sense if the script is separately distributed. But when it is the part of the Python distribution, I think that we should use the Python version. We can discuss this in a separate issue.

Tools/i18n/pygettext.py

		self.__messages= {}
		self.__state=self.__waiting
		self.__data= []
		self.__data=defaultdict(str)

Copy link

Member

serhiy-storchakaNov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why use defaultdict?

Copy link

MemberAuthor

tomasr8Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

So that I can do this one-liner 😄 :

https://github.com/python/cpython/pull/126912/files#diff-27bf55510663a73d2fdea1e604efdb59e0115378530202b5c55d04656dedece2R513

Tools/i18n/pygettext.py OutdatedShow resolvedHide resolved

Tools/i18n/pygettext.py Outdated

		eliftstringin')]}':
		self.__enclosurecount-=1
		elifexpect_string_literal:
		# We are inside an argument which is a translatable string and

Copy link

Member

serhiy-storchakaNov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think this can be merged with the below. But I can be wrong.

Copy link

MemberAuthor

tomasr8Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Hmm I don't think it can, unless I am misunderstading what you mean

Copy link

Member

serhiy-storchakaNov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It is not so important now, after you moved the ugly part of the code intowarn_unexpected_token(), but you can avoid the code duplication by using earlierreturns.

ifttype==tokenize.OPandself.__enclosurecount==0:iftstring==')':        ...returniftstring==',':        ...returnifexpect_string_literal:    ...# handle string literals, comments, etcelse:    ...# handle parentheses

Copy link

MemberAuthor

tomasr8Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice! I wasn't considering early returns but I think it's more digestible this way. I updated the code and added more test cases :)

tomasr8and others added4 commits

November 18, 2024 18:27

Only extract when __enclosure_count is 0

d6fd789

Keep the old version

3497690

Improve news entry

192187e

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>

Update snapshots

9cfc901

tomasr8 requested a review fromserhiy-storchaka

November 21, 2024 09:28

serhiy-storchaka reviewed

Nov 21, 2024

Tools/i18n/pygettext.py Outdated

Comment on lines 501 to 504

		eliftstringin'([{':
		self.__enclosurecount+=1
		eliftstringin')]}':
		self.__enclosurecount-=1

Copy link

Member

serhiy-storchakaNov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These are invalid ifexpect_string_literal is true.

I suspect that this does not work correctly for_('string'[i]).

See the comment below.

Tools/i18n/pygettext.py Outdated

		eliftstringin')]}':
		self.__enclosurecount-=1
		elifexpect_string_literal:
		# We are inside an argument which is a translatable string and

Copy link

Member

serhiy-storchakaNov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It is not so important now, after you moved the ugly part of the code intowarn_unexpected_token(), but you can avoid the code duplication by using earlierreturns.

ifttype==tokenize.OPandself.__enclosurecount==0:iftstring==')':        ...returniftstring==',':        ...returnifexpect_string_literal:    ...# handle string literals, comments, etcelse:    ...# handle parentheses

Refactor __openseen

24851c5

serhiy-storchaka approved these changes

Copy link

Member

serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM. 👍

bedevere-appbot added awaiting merge and removed awaiting review labels

serhiy-storchaka merged commit0a1944c intopython:main

40 checks passed

bedevere-appbot removed the awaiting merge label