- Notifications
You must be signed in to change notification settings - Fork32
Fixes to characters considered zero-width#34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Fixes to characters considered zero-width#34
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Manishearth commentedFeb 10, 2024
This implements aspecific standardized algorithm as documented in the readme. This rule around Default_Ignorable doesn't seem to be documented there. This isnot a general purpose terminal width library. |
Jules-Bertholet commentedFeb 10, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
This library already differs from UAX 11 in several important ways:
|
Manishearth commentedFeb 10, 2024
Hmm, yeah. I didn't originally write this but I would like for the code to follow the spec first and offer these things as settings |
Jules-Bertholet commentedFeb 10, 2024
UAX 11 doesn't really give a full, exact algorithm for getting a "width value" for a string. For example, control codes aren't even mentioned, nor are line breaks etc. So I think referring to other parts of the Unicode standard as well makes perfect sense. |
Manishearth commentedFeb 10, 2024
Hmm that's fair. Will review later. I would ideally like someone to take a holistic view of this crate, compare with the specs, and document/add options. Haven't had time to do this myself ever since I inherited it. |
Default_Ignorable_Code_Points as zero-widthDefault_Ignorable_Code_Points as zero-width, as well as vowel and trailing JamoDefault_Ignorable_Code_Points as zero-width, as well as vowel and trailing JamoJules-Bertholet commentedFeb 11, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I've added some comments throughout the code, but here is a summary of the current rules (with this PR's changes included):
What's still not handled, or could be handled differently:
|
Jules-Bertholet commentedFeb 11, 2024
https://www.unicode.org/L2/L2023/23107-terminal-suppt.pdf "Measurement" section highlights more problem cases |
Jules-Bertholet commentedFeb 12, 2024
See alsohttps://www.unicode.org/versions/Unicode15.1.0/ch05.pdf#G40095, "Characters Ignored for Display" |
…rols as non-zero width
Jules-Bertholet commentedFeb 12, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Unicode §5.21 - "Characters Ignored for Display" - "Default Ignorable Code Point" says:
Software that interprets the interlinear annotation characters should probably do that processing before passing to |
Uh oh!
There was an error while loading.Please reload this page.
These characters are supposed to be completely invisible and ignored by rendering unless specially supported:https://www.unicode.org/faq/unsup_char.html#3.Characters affected
Edit: Now alsofixes#26
Edit 2: I've marked
Prepended_Concatenation_Marks as not zero-width. This matches the behavior of glibcEdit 3: I've given U+115F HANGUL CHOSEONG FILLER back its width 2, because it's expected to be combined with other jamo to form a width-2 syllable block.