Movatterモバイル変換


[0]ホーム

URL:


Crateunicode_segmentation[][src]

Iterators which split strings on Grapheme Cluster, Word or Sentence boundaries, accordingto theUnicode Standard Annex #29 rules.

externcrateunicode_segmentation;useunicode_segmentation::UnicodeSegmentation;fnmain() {lets="a̐éö̲\r\n";letg=UnicodeSegmentation::graphemes(s,true).collect::<Vec<&str>>();letb:&[_]=&["a̐","é","ö̲","\r\n"];assert_eq!(g,b);lets="The quick (\"brown\") fox can't jump 32.3 feet, right?";letw=s.unicode_words().collect::<Vec<&str>>();letb:&[_]=&["The","quick","brown","fox","can't","jump","32.3","feet","right"];assert_eq!(w,b);lets="The quick (\"brown\")  fox";letw=s.split_word_bounds().collect::<Vec<&str>>();letb:&[_]=&["The"," ","quick"," ","(","\"","brown","\"",")","  ","fox"];assert_eq!(w,b);}

no_std

unicode-segmentation does not depend on libstd, so it can be used in crateswith the#![no_std] attribute.

crates.io

You can use this package in your project by adding the followingto yourCargo.toml:

[dependencies]unicode-segmentation = "1.7.1"

Structs

GraphemeCursor

Cursor-based segmenter for grapheme clusters.

GraphemeIndices

External iterator for grapheme clusters and byte offsets.

Graphemes

External iterator for a string’sgrapheme clusters.

USentenceBoundIndices

External iterator for sentence boundaries and byte offsets.

USentenceBounds

External iterator for a string’ssentence boundaries.

UWordBoundIndices

External iterator for word boundaries and byte offsets.

UWordBounds

External iterator for a string’sword boundaries.

UnicodeSentences

An iterator over the substrings of a string which, after splitting the string onsentence boundaries,contain any characters with theAlphabeticproperty, or withGeneral_Category=Number.

UnicodeWordIndices

An iterator over the substrings of a string which, after splitting the string onword boundaries,contain any characters with theAlphabeticproperty, or withGeneral_Category=Number.This iterator also provides the byte offsets for each substring.

UnicodeWords

An iterator over the substrings of a string which, after splitting the string onword boundaries,contain any characters with theAlphabeticproperty, or withGeneral_Category=Number.

Enums

GraphemeIncomplete

An error return indicating that not enough content was available in theprovided chunk to satisfy the query, and that more content must be provided.

Constants

UNICODE_VERSION

The version ofUnicodethat this version of unicode-segmentation is based on.

Traits

UnicodeSegmentation

Methods for segmenting strings according toUnicode Standard Annex #29.


[8]ページ先頭

©2009-2025 Movatter.jp