Expand description
Determine displayed width ofchar andstr types according toUnicode Standard Annex #11and other portions of the Unicode standard.See theRules for determining width sectionfor the exact rules.
This crate is#![no_std].
useunicode_width::UnicodeWidthStr;letteststr ="Hello, world!";letwidth = UnicodeWidthStr::width(teststr);println!("{}", teststr);println!("The above string is {} columns wide.", width);§"cjk" feature flag
This crate has one Cargo feature flag,"cjk"(enabled by default).It enables theUnicodeWidthChar::width_cjkandUnicodeWidthStr::width_cjk,which perform an alternate width calculationmore suited to CJK contexts. The flag also unseals theUnicodeWidthChar andUnicodeWidthStr traits.
Disabling the flag (withno_default_features inCargo.toml)will reduce the amount of static data needed by the crate.
useunicode_width::UnicodeWidthStr;letteststr ="“𘀀”";assert_eq!(teststr.width(),4);#[cfg(feature ="cjk")]assert_eq!(teststr.width_cjk(),6);§Rules for determining width
This crate currently uses the following rules to determine the width of acharacter or string, in order of decreasing precedence. These may be tweaked in the future.
- In the following cases, the width of a string differs from the sum of the widths of its constituent characters:
- The sequence
"\r\n"has width 1. - Emoji-specific ligatures:
- Well-formed, fully-qualifiedemoji ZWJ sequences have width 2.
- Emoji modifier sequences have width 2.
- Emoji presentation sequences have width 2.
- Outside of an East Asian context,text presentation sequences have width 1 if their base character:
- Has the
Emoji_Presentationproperty, and - Is not in theEnclosed Ideographic Supplement block.
- Has the
'\u{2018}','\u{2019}','\u{201C}', and'\u{201D}'always have width 1when followed by ‘\u{FE00}’ or ‘\u{FE02}’, and width 2 when followed by ‘\u{FE01}’.- Script-specific ligatures:
- For all the following ligatures, the insertion of any number ofdefault-ignorablecombining marks anywhere in the sequence will not change the total width. In addition, for all non-Arabicligatures, the insertion of any number of
'\u{200D}'ZERO WIDTH JOINERswill not affect the width. - Arabic: A character sequence consisting of one character with
Joining_Group=Lam,followed by any number of characters withJoining_Type=Transparent, followed by one characterwithJoining_Group=Alef, has total width 1. For example:لا,لآ,ڸا,لٟٞأ - Buginese:
"\u{1A15}\u{1A17}\u{200D}\u{1A10}"(<a, -i> ya,ᨕᨗᨐ) has total width 1. - Hebrew:
"א\u{200D}ל"(Alef-Lamed,אל) has total width 1. - Khmer: Coeng signs consisting of
'\u{17D2}'followed by a character in'\u{1780}'..='\u{1782}' | '\u{1784}'..='\u{1787}' | '\u{1789}'..='\u{178C}' | '\u{178E}'..='\u{1793}' | '\u{1795}'..='\u{1798}' | '\u{179B}'..='\u{179D}' | '\u{17A0}' | '\u{17A2}' | '\u{17A7}' | '\u{17AB}'..='\u{17AC}' | '\u{17AF}'have width 0. - Kirat Rai: Any sequence canonically equivalent to
'\u{16D68}','\u{16D69}', or'\u{16D6A}'has total width 1. - Lisu: Tone letter combinations consisting of a character in the range
'\u{A4F8}'..='\u{A4FB}'followed by a character in the range'\u{A4FC}'..='\u{A4FD}'have width 1. For example:ꓹꓼ - Old Turkic:
"\u{10C32}\u{200D}\u{10C03}"(𐰲𐰃) has total width 1. - Tifinagh: A sequence of a Tifinagh consonant in the range
'\u{2D31}'..='\u{2D65}' | '\u{2D6F}', followed by either'\u{2D7F}'TIFINAGH CONSONANT JOINER or'\u{200D}', followed by another Tifinangh consonant, has total width 1.For example:ⵏ⵿ⴾ
- For all the following ligatures, the insertion of any number ofdefault-ignorablecombining marks anywhere in the sequence will not change the total width. In addition, for all non-Arabicligatures, the insertion of any number of
- In an East Asian context only,
<,=, or>have width 2 when followed by'\u{0338}'COMBINING LONG SOLIDUS OVERLAY.The two characters may be separated by any number of characters whose canonical decompositions consist only of characters meetingone of the following requirements:- Has
Canonical_Combining_Classgreater than 1, or - Is adefault-ignorablecombining mark.
- Has
- The sequence
- In all other cases, the width of the string equals the sum of its character widths:
'\u{2D7F}'TIFINAGH CONSONANT JOINER has width 1 (outside of the ligatures described previously).'\u{115F}'HANGUL CHOSEONG FILLER and'\u{17A4}'KHMER INDEPENDENT VOWEL QAA have width 2.'\u{17D8}'KHMER SIGN BEYYAL has width 3.- The following have width 0:
- Characterswith the
Default_Ignorable_Code_Pointproperty. - Characterswith the
Grapheme_Extendproperty. - Characterswith a
Hangul_Syllable_TypeofVowel_Jamo(V) orTrailing_Jamo(T). - The following
Prepended_Concatenation_Marks: - Characterswith the
Grapheme_Extend=Prependproperty, that are not alsoPrepended_Concatenation_Marks. '\u{A8FA}'DEVANAGARI CARET.
- Characterswith the
- Characterswith an
East_Asian_WidthofFullwidthorWidehave width 2. - Characters fulfilling all of the following conditions have width 2 in an East Asian context, and width 1 otherwise:
- Fulfills one of the following conditions:
- Has an
East_Asian_WidthofAmbiguous, or - Has a
Line_BreakofAI, or - Has a canonical decomposition to an
Ambiguouscharacter followed by'\u{0338}'COMBINING LONG SOLIDUS OVERLAY, or - Is
'\u{0387}'GREEK ANO TELEIA; and
- Has an
- Does not have a
General_CategoryofLetterorModifier_Symbol.
- Fulfills one of the following conditions:
- All other characters have width 1.
§Canonical equivalence
Canonically equivalent strings are assigned the same width (CJK and non-CJK).
Constants§
- UNICODE_
VERSION - The version ofUnicodethat this version of unicode-width is based on.
Traits§
- Unicode
Width Char - Methods for determining displayed width of Unicode characters.
- Unicode
Width Str - Methods for determining displayed width of Unicode strings.