Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
gh-130703: Implement wrapping to width for msgids#130705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
Requesting@tomasr8@serhiy-storchaka :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This does not work.
- It can break escape sequences.
- The normalized message can already be multiline. Splitting it again will produce too short lines and even empty lines.
I need to update normalize to wrap respecting words |
picnixz left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can't you usetextwrap.wrap
for wrapping? It's not perfect but it ought to detect do most of the job?
tomasr8 commentedFeb 28, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I'm afraid textwrap won't always work. I suggest adding the wrapping logic to the normalize function. pybabel does it in a similar way, you can have a look at their implementation:https://github.com/python-babel/babel/blob/master/babel/messages/pofile.py#L464 |
Implemented pybabels method. |
picnixz left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm not really the best for reviewing this but I can review the implementation. Please, don't just apply my suggestions as is and decide which one is the best.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm not sure that the regex approach is correct. It would gobble up consecutive spaces right?
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
I really recommend creating a dummy file with some gettext calls and comparing the differences between pygettext, xgettext and babel. There are some differences that should be considered. Here's two I noticed:
_('foos') ran with msgid """""foos"msgstr "" while xgettext and babel give me this (i.e. they don't insert two extra msgid"foos"msgstr"" |
StanFromIreland commentedMar 2, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
As for the header, this will conflict with my implementation of |
StanFromIreland commentedMar 2, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Test fail unrelated. Wrapping header will require a separate function like so: Subject: [PATCH] Wrap header---Index: Tools/i18n/pygettext.pyIDEA additional info:Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP<+>UTF-8===================================================================diff --git a/Tools/i18n/pygettext.py b/Tools/i18n/pygettext.py--- a/Tools/i18n/pygettext.py(revision 8d03cbf141068c4ac9812a967a4c9f5942e22d75)+++ b/Tools/i18n/pygettext.py(date 1740913002600)@@ -589,12 +589,36 @@ def _is_string_const(self, node): return isinstance(node, ast.Constant) and isinstance(node.value, str)++def _wrap_header(s, options):+ lines = []+ for line in s.splitlines():+ if len(line) > options.width and ' ' in line:+ words = _space_splitter(line)+ words.reverse()+ buf = []+ size = 0+ while words:+ word = words.pop()+ if size + len(word) <= options.width:+ buf.append(word)+ size += len(word)+ else:+ lines.append(''.join(buf))+ buf = [word]+ size = len(word)+ lines.append(''.join(buf))+ else:+ lines.append(line)+ return "\n".join(lines) + "\n"++ def write_pot_file(messages, options, fp): timestamp = time.strftime('%Y-%m-%d %H:%M%z') encoding = fp.encoding if fp.encoding else 'UTF-8'- print(pot_header % {'time': timestamp, 'version': __version__,+ print(_wrap_header(pot_header % {'time': timestamp, 'version': __version__, 'charset': encoding,- 'encoding': '8bit'}, file=fp)+ 'encoding': '8bit'}, options), file=fp) # Sort locations within each message by filename and lineno sorted_keys = [ |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
It looks almost ready now. Please add more tests for cases. It may be convenient to use the same string with different widths.
Add tests for the cases whenlen(escaped_line) + len(prefix) + 3
equals towidth
and when it equals towidth
.
Add tests for the cases whennew_size + 2
equals towidth
and when it equals towidth + 1
.
Add tests for too long first word (new_size + 2 >
widthand
buf` is empty) and for too long last word.
Add tests for whitespaces other than' '
and'\n'
(e.g. for'\t'
and'\r'
), for non-ASCII line separators and whitespaces. Test for different escaping mode.
Do not add a separate method for every case. Group assertions for similar cases in one method.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Friendly ping@serhiy-storchaka :-) |
Sorry, the tests still do not satisfy me. I am going to play with them myself, and then propose my variant. |
Uh oh!
There was an error while loading.Please reload this page.
width
is not implemented for msgids #130703