Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

bpo-24665: double-width CJK chars support for textwrap#89

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
fgallaire wants to merge8 commits intopython:masterfromfgallaire:master

Conversation

fgallaire
Copy link

@fgallairefgallaire commentedFeb 14, 2017
edited
Loading

  • Add ckj option flag, default to False
  • Add cjk_wide(), cjk_len() and cjk_slices() utilities

* Add ckj option flag, default to False* Add cjkwide(), cjklen() and cjkslices() utilities
@the-knights-who-say-ni

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed thePSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username onbugs.python.org (b.p.o) to verify you have signed the CLA. This is necessary for legal reasons before we can look at your contribution. Please follow these steps to help rectify the issue:

  1. If you don't have an account on b.p.o, pleasecreate one
  2. Make sure your GitHub username is listed in"Your Details" at b.p.o
  3. If you have not already done so, please sign thePSF contributor agreement
  4. If you just signed the CLA, pleasewait at least one US business day and then check "Your Details" onbugs.python.org to see if your account has been marked as having signed the CLA (the delay is due to a person having to manually check your signed CLA)
  5. Reply here saying you have completed the above steps

Thanks again to your contribution and we look forward to looking at it!

@fgallaire
Copy link
Author

fgallaire commentedFeb 14, 2017
edited
Loading

I now have listed my GitHub username at bpo and i had already signed the CLA.

@@ -114,6 +117,7 @@ class TextWrapper:

def __init__(self,
width=70,
cjk=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These arguments can be passed as positional argument.
So new argument should be added last.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I agree

i = 1
# <= and i-1 to catch the last double length char of odd line
while cjklen(text[:i]) <= index:
i = i + 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't like this O(n^2) algorithm.

w = 0for i, c in enumerate(text):    w += cjkwide(c) + 1    if w > index:        break

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Very relevant point.

@fgallairefgallaireforce-pushed themaster branch 2 times, most recently frome52db22 to5c4b2afCompareFebruary 14, 2017 09:26
Copy link
Member

@vstinnervstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You need to document the name option in Doc/library/textwrap.rst, don't forget ".. versionchanged:: 3.7" markup.

@@ -139,6 +145,7 @@ def __init__(self,
self.max_lines = max_lines
self.placeholder = placeholder

self.len = cjklen if self.cjk else len
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I suggest to make it private and use a different name than len: self._text_width() for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Exposing functions mean that you must document them, write unit tests, and that someone (maybe not you) have to maintain them. I don't think that it's worth it. Let's try with something simple, make these functions private.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I agree

@@ -365,7 +380,7 @@ def fill(self, text):

# -- Convenience interface ---------------------------------------------

def wrap(text, width=70, **kwargs):
def wrap(text, width=70,cjk=False,**kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There is not need to repeat cjk here, there is already a generic **kwargs. Same for other functions below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is the same need as width: to be visible so easily usable.

"""Return True if char is Fullwidth or Wide, False otherwise.
Fullwidth and Wide CJK chars are double-width.
"""
return unicodedata.east_asian_width(char) in ('F', 'W')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You can write {'F', 'W'}, it's optimized as a constant frozenset.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It's really faster than a tuple ?

return w.fill(' '.join(text.strip().split()))


# -- CJK support ------------------------------------------------------

def cjkwide(char):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Please add _ in the name: cjk_wide().

Note sure about the name: is_cjk_wide_char()?

Do we really need to make the function?

return unicodedata.east_asian_width(char) in ('F', 'W')


def cjklen(text):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Rename to cfk_width()?

"""Return the real width of text (its len if not a string).
"""
if not isinstance(text, str):
return len(text)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't understand this case. Why do you pass a non-string to the function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is for people who want to do:
from textwrap import cjklen as len
and use cjklen transparently

return sum(2 if cjkwide(char) else 1 for char in text)


def cjkslices(text, index):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

IMHO it's better to make all these new functions private: add _ prefix. Rename to _cjk_slices().

"index": is it a number of character or a width? Maybe rename to width?

Copy link
Author

@fgallairefgallaireFeb 14, 2017
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

No, I really want this functions to be exported, they are useful

@vstinner
Copy link
Member

By email I proposed a different design. Make existing TextWrapper extensible: add text_width() and truncate() methods. Then add a new TextWrapperCJK which overrides these methods. It would allow to reuse the code for other cases than just CJK.

# Written by Greg Ward <gward@python.net>

import re
import re, unicodedata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

One import per line please.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I agree

@@ -139,6 +145,7 @@ def __init__(self,
self.max_lines = max_lines
self.placeholder = placeholder

self.len = cjklen if self.cjk else len
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Exposing functions mean that you must document them, write unit tests, and that someone (maybe not you) have to maintain them. I don't think that it's worth it. Let's try with something simple, make these functions private.

@vstinner
Copy link
Member

Your change breaks Python build: Python requires optparse to compile modules like unicodedata, optparse imports textwrap which nowalways requires unicodedata.

Using two different classes, it would be simpler to only import unicodedata when the TextWrapperCJK class is instanciated, "on demand", and so fix the bootstrap issue.

@fgallaire
Copy link
Author

fgallaire commentedFeb 15, 2017
edited
Loading

What about a depreciation warning to inform that cjk default will switch toTrue in Python 3.8 ?

@fgallaire
Copy link
Author

fgallaire commentedFeb 15, 2017
edited
Loading

optparse is deprecated since Python 3.2 so it should not drive this work, but a port to argparse will not solve this problem because of the same cycling dependency.

@codecov
Copy link

codecovbot commentedFeb 15, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@984eef7).Click here to learn what that means.
The diff coverage is72.5%.

@@            Coverage Diff            @@##             master      #89   +/-   ##=========================================  Coverage          ?   82.38%           =========================================  Files             ?     1428             Lines             ?   351193             Branches          ?        0           =========================================  Hits              ?   289333             Misses            ?    61860             Partials          ?        0

Continue to review full report at Codecov.

Legend -Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing data
Powered byCodecov. Last update984eef7...54de7aa. Read thecomment docs.

Copy link
Member

@methanemethane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

While I agree supporting east_asian_wdith is preferable, I don't like current API.
It's OK for third party library. But for standard library, I prefer more generic API and implementation.

While "practical beats purity", I want glibc's wcwidth for UTF-8 at least. (bonus: option like ambiwidth=double in vim)


.. function:: cjk_len(text)

Return the real width of *text* (its len if not a string).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't like wordingreal.
How about zero-width space? How about combining character sequence? variation selector?
EMOJI MODIFIER?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You're probably right. Thereal purpose is to help to understand that width (cjk_len) and len are different. What do you think aboutvisual instead ?

@yan12125
Copy link
Contributor

I want glibc's wcwidth for UTF-8 at least

Disagreed. I guess text editors and terminals are top two applications in textwrap. I known little about text editors. In terminals, things are quite interesting. Different terminals use different versions ofEastAsianWidth.txt. Here are some examples:

  • tmux, QTerminal: use wcwidth() by default. In the latest glibc, it's still Unicode 8.0. [1] On Mac, it's reported that wcwidth() is broken [2].
  • VTE (gnome-terminal, xfce4-terminal, etc.): Unicode 9.0 [3]
  • Konsole: Unicode 5.0 [4]

Different Unicode versions can lead to different results fortextwrap. Take U+231A (clock emoji) for example. In Unicode 8.0 it's NEUTRAL. Most implementations see it as width=1. In Unicode 9.0 it's changed to WIDE.

In CPython, handling multiple Unicode versions sounds impractical. IMO chasing the latest Unicode 9.0 is a good idea. If a terminal emulator is not compatible with Unicode 9.0, it should be fixed.

[1]https://sourceware.org/bugzilla/show_bug.cgi?id=20313
[2]tmux/tmux#515
[3]https://bugzilla.gnome.org/show_bug.cgi?id=771591
[4]https://github.com/KDE/konsole/blob/master/src/konsole_wcwidth.cpp#L55

@yan12125
Copy link
Contributor

Also, I would recommend rename cjk_foobar stuffs to unicode_foobar. CJK characters occupy a large portion on Unicode tables but not the whole.

@methane
Copy link
Member

@yan12125 sorry, my wording was wrong. (I'm not good at English).
When I said "I want glibc's wcwidth for UTF-8 at least", I meant
"if textwrap supports display width, I think it should be good as wcwidth at least.".
I didn't meant neither "textwrap should use wcwidth" nor "textwrap should implement algorithm
exactly same to wcwidth."

BTW, pull request is not place for discussion like this.
Please go tohttp://bugs.python.org/issue24665

Fullwidth and Wide CJK chars are double-width.
"""
import unicodedata
return unicodedata.east_asian_width(char) in ('F', 'W')

This comment was marked as resolved.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Please,@duboviy or@Haypo, could you explain to me why/how a tuple could be slower than a frozenset.

def cjk_len(text):
"""Return the real width of text (its len if not a string).
"""
if not isinstance(text, str):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Strange case handling, maybe we should expect only string type text argument in this function...

Copy link
Author

@fgallairefgallaireMar 12, 2017
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Again: it's for an handy replacement of the built-inlen():
from textwrap import cjk_len as len

@@ -495,6 +495,7 @@ Lele Gaifax
Santiago Gala
Yitzchak Gale
Matthew Gallagher
Florent Gallaire

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Not sure if that makes lots of sense for such a small change (even though such changes are very welcome of course!)

There are commits from loads of people in this repo, but way less entries in this file (adding every contributor would quickly make the file pretty chaotic).

IMO it'd be better to only add yourself to this file when you contribute e.g. a major fix, new feature, etc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I strongly disagree, if persons are missing they should be added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Florent definitely belongs in ACKS. We have been fairly liberal in adding people, some for as little as a well-crafted sentence or line of code.

@@ -428,6 +427,7 @@ def cjk_wide(char):
"""Return True if char is Fullwidth or Wide, False otherwise.
Fullwidth and Wide CJK chars are double-width.
"""
import unicodedata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think importing unicodedata, when wanted, for each char, is wrong. If not imported at the top, it could be imported as a global in TextWrapper.init when cjk is true. (I am assuming that the convenience functions all instantiate TextWrapper.

akruis pushed a commit to akruis/cpython that referenced this pull requestSep 9, 2017
…er shutdown codeFix the implementation to work as documented if a thread dies. NowStackless kills only tasklets with a nesting level > 0. During interpretershutdown stackless additionally kills daemon threads, if they execute Python code orswitch tasklets. This prevents access violations after clearing the thread states.Add a 1ms sleep for each daemon thread. This way the thread gets a betterchance to run. Thanks to Kristján Valur Jónsson for suggesting this improvement.Add a test case for a C assertion violation during thread shutdown.Add some explanatory comments to the shutdown test case.Thanks to Christian Tismer for the sugestion.https://bitbucket.org/stackless-dev/stackless/issues/81https://bitbucket.org/stackless-dev/stackless/issues/89(grafted from 2cc41427a347a1ebec4eedc3db06a3664e67f798, 1388a2003957,6140f5aaca2c and edc9b92ec457)
akruis pushed a commit to akruis/cpython that referenced this pull requestSep 9, 2017
@brettcannon
Copy link
Member

To try and help move older pull requests forward, we are going through and backfilling 'awaiting' labels on pull requests that are lacking the label. Based on the current reviews, the best we can tell in an automated fashion is that a core developer requested changes to be made to this pull request.

If/when the requested changes have been made, please leave a comment that says,I have made the requested changes; please review again. That will trigger a bot to flag this pull request as ready for a follow-up review.

@methane
Copy link
Member

I close this because I don't like APIs this PR has and the author seems satisfied by
putting his library on PyPI.

@methanemethane closed thisJul 8, 2018
jaraco pushed a commit that referenced this pull requestDec 2, 2022
jaraco added a commit to jaraco/cpython that referenced this pull requestFeb 17, 2023
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@methanemethanemethane requested changes

@terryjreedyterryjreedyterryjreedy left review comments

@vstinnervstinnervstinner requested changes

@duboviyduboviyduboviy requested changes

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

9 participants
@fgallaire@the-knights-who-say-ni@vstinner@yan12125@methane@brettcannon@duboviy@terryjreedy@zware

[8]ページ先頭

©2009-2025 Movatter.jp