Movatterモバイル変換


[0]ホーム

URL:


Docs.rs

Crate unicode_width

Crateunicode_width 

Source
Expand description

Determine displayed width ofchar andstr types according toUnicode Standard Annex #11and other portions of the Unicode standard.See theRules for determining width sectionfor the exact rules.

This crate is#![no_std].

useunicode_width::UnicodeWidthStr;letteststr ="Hello, world!";letwidth = UnicodeWidthStr::width(teststr);println!("{}", teststr);println!("The above string is {} columns wide.", width);

§"cjk" feature flag

This crate has one Cargo feature flag,"cjk"(enabled by default).It enables theUnicodeWidthChar::width_cjkandUnicodeWidthStr::width_cjk,which perform an alternate width calculationmore suited to CJK contexts. The flag also unseals theUnicodeWidthChar andUnicodeWidthStr traits.

Disabling the flag (withno_default_features inCargo.toml)will reduce the amount of static data needed by the crate.

useunicode_width::UnicodeWidthStr;letteststr ="“𘀀”";assert_eq!(teststr.width(),4);#[cfg(feature ="cjk")]assert_eq!(teststr.width_cjk(),6);

§Rules for determining width

This crate currently uses the following rules to determine the width of acharacter or string, in order of decreasing precedence. These may be tweaked in the future.

  1. In the following cases, the width of a string differs from the sum of the widths of its constituent characters:
    • The sequence"\r\n" has width 1.
    • Emoji-specific ligatures:
    • '\u{2018}','\u{2019}','\u{201C}', and'\u{201D}' always have width 1when followed by ‘\u{FE00}’ or ‘\u{FE02}’, and width 2 when followed by ‘\u{FE01}’.
    • Script-specific ligatures:
      • For all the following ligatures, the insertion of any number ofdefault-ignorablecombining marks anywhere in the sequence will not change the total width. In addition, for all non-Arabicligatures, the insertion of any number of'\u{200D}' ZERO WIDTH JOINERswill not affect the width.
      • Arabic: A character sequence consisting of one character withJoining_Group=Lam,followed by any number of characters withJoining_Type=Transparent, followed by one characterwithJoining_Group=Alef, has total width 1. For example:لا‎,لآ‎,ڸا‎,لٟٞأ
      • Buginese:"\u{1A15}\u{1A17}\u{200D}\u{1A10}" (<a, -i> ya,ᨕᨗ‍ᨐ) has total width 1.
      • Hebrew:"א\u{200D}ל" (Alef-Lamed,א‍ל) has total width 1.
      • Khmer: Coeng signs consisting of'\u{17D2}' followed by a character in'\u{1780}'..='\u{1782}' | '\u{1784}'..='\u{1787}' | '\u{1789}'..='\u{178C}' | '\u{178E}'..='\u{1793}' | '\u{1795}'..='\u{1798}' | '\u{179B}'..='\u{179D}' | '\u{17A0}' | '\u{17A2}' | '\u{17A7}' | '\u{17AB}'..='\u{17AC}' | '\u{17AF}'have width 0.
      • Kirat Rai: Any sequence canonically equivalent to'\u{16D68}','\u{16D69}', or'\u{16D6A}' has total width 1.
      • Lisu: Tone letter combinations consisting of a character in the range'\u{A4F8}'..='\u{A4FB}'followed by a character in the range'\u{A4FC}'..='\u{A4FD}' have width 1. For example:ꓹꓼ
      • Old Turkic:"\u{10C32}\u{200D}\u{10C03}" (𐰲‍𐰃) has total width 1.
      • Tifinagh: A sequence of a Tifinagh consonant in the range'\u{2D31}'..='\u{2D65}' | '\u{2D6F}', followed by either'\u{2D7F}' TIFINAGH CONSONANT JOINER or'\u{200D}', followed by another Tifinangh consonant, has total width 1.For example:ⵏ⵿ⴾ
    • In an East Asian context only,<,=, or> have width 2 when followed by'\u{0338}' COMBINING LONG SOLIDUS OVERLAY.The two characters may be separated by any number of characters whose canonical decompositions consist only of characters meetingone of the following requirements:
  2. In all other cases, the width of the string equals the sum of its character widths:
    1. '\u{2D7F}' TIFINAGH CONSONANT JOINER has width 1 (outside of the ligatures described previously).
    2. '\u{115F}' HANGUL CHOSEONG FILLER and'\u{17A4}' KHMER INDEPENDENT VOWEL QAA have width 2.
    3. '\u{17D8}' KHMER SIGN BEYYAL has width 3.
    4. The following have width 0:
    5. Characterswith anEast_Asian_Width ofFullwidth orWide have width 2.
    6. Characters fulfilling all of the following conditions have width 2 in an East Asian context, and width 1 otherwise:
    7. All other characters have width 1.

§Canonical equivalence

Canonically equivalent strings are assigned the same width (CJK and non-CJK).

Constants§

UNICODE_VERSION
The version ofUnicodethat this version of unicode-width is based on.

Traits§

UnicodeWidthChar
Methods for determining displayed width of Unicode characters.
UnicodeWidthStr
Methods for determining displayed width of Unicode strings.

[8]ページ先頭

©2009-2025 Movatter.jp