Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-74902: add unicode grapheme cluster break algorithm#2673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed

Conversation

@Vermeille
Copy link
Contributor

@VermeilleVermeille commentedJul 11, 2017
edited by bedevere-bot
Loading

I have added GraphemeBreakProperty to UnicodeData.
An automaton to compute the rules for breaking grapheme clusters according to TR29 is included. It passes all the tests provided in GraphemeBreakTests.txt.

https://bugs.python.org/issue30717

rsalmei, arhadthedev, cceyda, and woile reacted with hooray emoji
@the-knights-who-say-ni

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed thePSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username onbugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please followthe steps outlined in the CPython devguide to rectify this issue.

Thanks again to your contribution and we look forward to looking at it!

@VermeilleVermeille changed the titleWIP: add grapheme cluster break algorithmbpo-30717: WIP: add grapheme cluster break algorithmJul 11, 2017
@VermeilleVermeilleforce-pushed thegrapheme_cluster_break branch from0f82f82 to62fd6e0CompareJuly 13, 2017 23:29
@VermeilleVermeille changed the titlebpo-30717: WIP: add grapheme cluster break algorithmbpo-30717: add grapheme cluster break algorithmJul 14, 2017
@VermeilleVermeille changed the titlebpo-30717: add grapheme cluster break algorithmbpo-30717: add unicode grapheme cluster break algorithmJul 14, 2017
@VermeilleVermeilleforce-pushed thegrapheme_cluster_break branch from62fd6e0 toa47de54CompareJuly 14, 2017 03:21
@Vermeille
Copy link
ContributorAuthor

Hello? Someone here?

@serhiy-storchakaserhiy-storchaka added the type-featureA feature request or enhancement labelAug 3, 2017
0, /*tp_setattro*/
0, /*tp_as_buffer*/
Py_TPFLAGS_DEFAULT,
"Internal grapheme cluster iterator object.", /* tp_doc */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think the words "internal" and "object" are redundant.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

"Internal", "iterator" and "object" are all redundant. "Grapheme cluster iterator" seems just right. What do you think?

erlend-aasland reacted with thumbs up emoji
@Vermeille
Copy link
ContributorAuthor

Sorry for the long wait.

Are we good concerning the changes? Anything to add?

@brettcannon
Copy link
Member

To try and help move older pull requests forward, we are going through and backfilling 'awaiting' labels on pull requests that are lacking the label. Based on the current reviews, the best we can tell in an automated fashion is that a core developer requested changes to be made to this pull request.

If/when the requested changes have been made, please leave a comment that says,I have made the requested changes; please review again. That will trigger a bot to flag this pull request as ready for a follow-up review.

@csabella
Copy link
Contributor

@Vermeille, please take a look at the most recent comments on the bug tracker for this issue. It looks like the suggested path forward is different than the solution you proposed here. Thanks!

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actionsgithub-actionsbot added the staleStale PR or inactive for long period of time. labelFeb 20, 2022
@github-actionsgithub-actionsbot removed the staleStale PR or inactive for long period of time. labelJul 28, 2022
@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actionsgithub-actionsbot added the staleStale PR or inactive for long period of time. labelAug 29, 2022
@arhadthedevarhadthedev changed the titlebpo-30717: add unicode grapheme cluster break algorithmgh-74902: add unicode grapheme cluster break algorithmFeb 14, 2023
@github-actionsgithub-actionsbot removed the staleStale PR or inactive for long period of time. labelFeb 15, 2023
@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actionsgithub-actionsbot added the staleStale PR or inactive for long period of time. labelMar 18, 2023
@bedevere-app
Copy link

Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply theskip news label instead.

@bedevere-app
Copy link

Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply theskip news label instead.

Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I apologize that it took so long to start reviewing this PR seriously.

Now we need this algorithm to calculate the width of text in columns, which is needed to support wide characters in many parts of the stdlib (REPL, tracebacks, etc). So we will add its implementation anyway. If you are busy or have lost interest, I will finish this work myself (keeping your credit), but if you are still interested, I would be happy to work together.

I wonder, what is the source of the state machine table? Did you created it from the original rules or from the table in GraphemeBreakTest.html? Or copied it from other source? I afraid that it is outdated and only supports legacy grapheme clusters. I can fix this, but maybe you already have a ready solution?


self: self
unistr: unicode
start: int = 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It should be Py_ssize_t. Some other variables should be Py_ssize_t, not int.

self: self
unistr: unicode
start: int = 0
end: Py_ssize_t(c_default="PY_SSIZE_T_MAX - 1") = sys.maxsize

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It should be PY_SSIZE_T_MAX.

Although I am not sure that theend parameter is needed. The user can simply stop iteration at any time.

@staticmethod
def check_version(testfile):
hdr = testfile.readline()
return unicodedata.unidata_version in hdr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What does the file header look like?

With string contains tests, I worry about things like"8.0" in "18.0" matching wrongly. Could the full line be compared?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

# GraphemeBreakTest-17.0.0.txt

We have the same check for normalization tests.

@bedevere-app
Copy link

Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply theskip news label instead.

@serhiy-storchakaserhiy-storchaka marked this pull request as draftDecember 17, 2025 22:16
@serhiy-storchakaserhiy-storchaka self-assigned thisDec 17, 2025
@github-actionsgithub-actionsbot removed the staleStale PR or inactive for long period of time. labelDec 18, 2025
@AA-Turner
Copy link
Member

Closing in favour of#143076.

A

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@merwokmerwokmerwok left review comments

@serhiy-storchakaserhiy-storchakaserhiy-storchaka left review comments

@methanemethanemethane requested changes

Labels

type-featureA feature request or enhancement

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

9 participants

@Vermeille@the-knights-who-say-ni@brettcannon@csabella@AA-Turner@methane@merwok@serhiy-storchaka@ezio-melotti

[8]ページ先頭

©2009-2026 Movatter.jp