Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.2k
bpo-24665: Add CJK support in textwrap by default.#5649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
width = 0 | ||
pos = 0 | ||
for char in text: | ||
width += 2 if east_asian_width(char) in {'F', 'W'} else 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why inlining _len(), I don't have seen performance issues and it's less readable (less pythonic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
How do you know where to break once you have the whole value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Hello I was reading too fast. In my version there's the_wide
boolean function.
So herewidth += _wide(char) + 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
And_len
is justreturn sum(2 if _wide(char) else 1 for char in text)
with no performance issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
More pythonic, DRY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I won't bet on the performances, calling _len from _slice adds two functions calls per character (one to _len and one to sum). In one case I'm doing it on a character, and in the other case in a whole string. Yes I could also factorize this ternary to a third function, but I don't find it more readable.
width += 2 if east_asian_width(char) in {'F', 'W'} else 1 | ||
if width > index: | ||
break | ||
pos += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why note use enumerate(), it's less readable (less pythonic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Because it does not works with enumerate as the last incrementation were not done. I do not remember which case exactly but if you run the unit test you'll spot it easily, it was failing, I'll do if needed but can't right now.
fgallaireFeb 15, 2018 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Interested in that, the code was strongly tested for txt2tags and don't catch this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Your initial implementation was working thanks to yourif cjk_len(text) <= index: return text, ''
fixing the special case explicitly, I may have tried to avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
"Explicit is better than implicit." but the more important is that both solutions are correct.
fgallaire commentedFeb 14, 2018
Don't see my author credit |
fgallaire commentedFeb 15, 2018
And you miss the |
45fd84d
to4623375
CompareCo-authored-by: Florent Gallaire <fgallaire@gmail.com>
You're right! And trying to split a wide character yield to an infinite loop.
Gladly fixed and co-authored you. |
fgallaire commentedMar 6, 2018
Thanks@JulienPalard, I'm so happy ! I had almost lost hope to see this issue fixed. |
if self.width <= 0: | ||
raise ValueError("invalid width %r (must be > 0)" % self.width) | ||
elif self.width == 1 and _width(text) > len(text): | ||
raise ValueError("invalid width 1 (must be > 1 when CJK chars)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I have done a more complex solution:
elif self.width == 1 and (sum(self._width(chunk) for chunk in chunks) > sum(len(chunk) for chunk in chunks)):
It throws the exception earlier, but it's probably not absolutely necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The change I request is that this be closed because it is conceptually wrong. Textwrap works in terms of abstract 'characters' (codepoint), not physical units. I will explain this on the issue.
Aside from that, 2 is the wrong number to add, as 'double-width' characters are not actually twice as wide as fixed-pitch Ascii chars of the same height. See the issue for this as well.
bedevere-bot commentedJul 8, 2018
When you're done making the requested changes, leave the comment: |
Uh oh!
There was an error while loading.Please reload this page.
Related to:
https://bugs.python.org/issue24665