Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-74902: add unicode grapheme cluster break algorithm#2673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes fromall commits
Commits
Show all changes
13 commits
Select commitHold shift + click to select a range
b79f969
add unicodedata.grapheme_cluster_break()
VermeilleJul 11, 2017
c9a4211
generate unicodedata.c.h with clinic
VermeilleJul 13, 2017
7f56b78
patchcheck
VermeilleJul 13, 2017
a47de54
add the grapheme cluster break automaton
VermeilleJul 13, 2017
c152171
add my name to Misc/ACKS
VermeilleJul 14, 2017
b103be7
code review fixes
VermeilleAug 3, 2017
c9848e2
rename break_graphemes to iter_graphemes
VermeilleAug 3, 2017
2dee91e
make GraphemeClusterIterator a GC type
VermeilleAug 3, 2017
a5b3c10
allow iterating only over a range of indices
VermeilleJan 10, 2018
ba1e9b7
Merge branch 'main' into grapheme_cluster_break
serhiy-storchakaDec 17, 2025
ae06154
Add some tests.
serhiy-storchakaDec 17, 2025
1965ed6
Save Grapheme_Cluster_Break for unassigned code points.
serhiy-storchakaDec 17, 2025
164d19b
Make "Any" the first entry.
serhiy-storchakaDec 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletionsLib/test/test_unicodedata.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -848,5 +848,52 @@ class MyStr(str):
self.assertIs(type(normalize(form, MyStr(input_str))), str)


class GraphemeBreakTest(unittest.TestCase):
@staticmethod
def check_version(testfile):
hdr = testfile.readline()
return unicodedata.unidata_version in hdr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What does the file header look like?

With string contains tests, I worry about things like"8.0" in "18.0" matching wrongly. Could the full line be compared?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

# GraphemeBreakTest-17.0.0.txt

We have the same check for normalization tests.


@requires_resource('network')
def test_grapheme_break(self):
TESTDATAFILE = "auxiliary/GraphemeBreakTest.txt"
TESTDATAURL = f"https://www.unicode.org/Public/{unicodedata.unidata_version}/ucd/{TESTDATAFILE}"

# Hit the exception early
try:
testdata = open_urlresource(TESTDATAURL, encoding="utf-8",
check=self.check_version)
except PermissionError:
self.skipTest(f"Permission error when downloading {TESTDATAURL} "
f"into the test data directory")
except (OSError, HTTPException) as exc:
self.skipTest(f"Failed to download {TESTDATAURL}: {exc}")

with testdata:
self.run_grapheme_break_tests(testdata, unicodedata)

def run_grapheme_break_tests(self, testdata, ucd):
part = None
part1_data = set()

for line in testdata:
line, _, comment = line.partition('#')
line = line.strip()
if not line:
continue
comment = comment.strip()

chunks = []
for field in line.replace('×', ' ').split():
if field == '÷':
chunks.append('')
else:
chunks[-1] += chr(int(field, 16))
self.assertEqual(chunks.pop(), '', line)
with self.subTest(line):
result = list(unicodedata.iter_graphemes(''.join(chunks)))
self.assertEqual(result, chunks, comment)


if __name__ == "__main__":
unittest.main()
1 change: 1 addition & 0 deletionsMisc/ACKS
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -1662,6 +1662,7 @@ Victor Salgado
Rich Salz
Kevin Samborn
Adrian Sampson
Guillaume Sanchez
Nevada Sanchez
James Sanders
Ilya Sandler
Expand Down
99 changes: 98 additions & 1 deletionModules/clinic/unicodedata.c.h
View file
Open in desktop

Some generated files are not rendered by default. Learn more abouthow customized files appear on GitHub.

Loading
Loading

[8]ページ先頭

©2009-2026 Movatter.jp