Primitive Typechar

1.0.0

Expand description

A character type.

Thechar type represents a single character. More specifically, since‘character’ isn’t a well-defined concept in Unicode,char is a ‘Unicodescalar value’.

This documentation describes a number of methods and trait implementations on thechar type. For technical reasons, there is additional, separatedocumentation inthestd::char module as well.

§Validity and Layout

Achar is a ‘Unicode scalar value’, which is any ‘Unicode code point’other than asurrogate code point. This has a fixed numerical definition:code points are in the range 0 to 0x10FFFF, inclusive.Surrogate code points, used by UTF-16, are in the range 0xD800 to 0xDFFF.

Nochar may be constructed, whether as a literal or at runtime, that is not aUnicode scalar value. Violating this rule causes undefined behavior.

ⓘ

// Each of these is a compiler error['\u{D800}','\u{DFFF}','\u{110000}'];

ⓘ

// Panics; from_u32 returns None.char::from_u32(0xDE01).unwrap();

// Undefined behaviorlet _=unsafe{ char::from_u32_unchecked(0x110000) };

Unicode scalar values are also the exact set of values that may be encoded in UTF-8. Becausechar values are Unicode scalar values and functions may assumeincomingstr values arevalid UTF-8, it is safe to store anychar in astr or readany character from astr as achar.

The gap in validchar values is understood by the compiler, so in thebelow example the two ranges are understood to cover the whole range ofpossiblechar values and there is no error for anon-exhaustive match.

letc: char ='a';matchc {'\0'..='\u{D7FF}'=>false,'\u{E000}'..='\u{10FFFF}'=>true,};

All Unicode scalar values are validchar values, but not all of them represent a realcharacter. Many Unicode scalar values are not currently assigned to a character, but may be inthe future (“reserved”); some will never be a character (“noncharacters”); and some may be givendifferent meanings by different users (“private use”).

char is guaranteed to have the same size, alignment, and function call ABI asu32 on allplatforms.

usestd::alloc::Layout;assert_eq!(Layout::new::<char>(), Layout::new::<u32>());

§Representation

char is always four bytes in size. This is a different representation thana given character would have as part of aString. For example:

letv =vec!['h','e','l','l','o'];// five elements times four bytes for each elementassert_eq!(20, v.len() * size_of::<char>());lets = String::from("hello");// five elements times one byte per elementassert_eq!(5, s.len() * size_of::<u8>());

As always, remember that a human intuition for ‘character’ might not map toUnicode’s definitions. For example, despite looking similar, the ‘é’character is one Unicode code point while ‘é’ is two Unicode code points:

letmutchars ="é".chars();// U+00e9: 'latin small letter e with acute'assert_eq!(Some('\u{00e9}'), chars.next());assert_eq!(None, chars.next());letmutchars ="é".chars();// U+0065: 'latin small letter e'assert_eq!(Some('\u{0065}'), chars.next());// U+0301: 'combining acute accent'assert_eq!(Some('\u{0301}'), chars.next());assert_eq!(None, chars.next());

This means that the contents of the first string abovewill fit into achar while the contents of the second stringwill not. Trying to createachar literal with the contents of the second string gives an error:

error: character literal may only contain one codepoint: 'é'let c = 'é';        ^^^

Another implication of the 4-byte fixed size of achar is thatper-char processing can end up using a lot more memory:

lets = String::from("love: ❤️");letv: Vec<char> = s.chars().collect();assert_eq!(12, size_of_val(&s[..]));assert_eq!(32, size_of_val(&v[..]));

Movatterモバイル変換

Primitive Typechar Copy item path

§Validity and Layout

§Representation

Implementations§

implchar

pub constMIN:char = '\0'

§Examples

pub constMAX:char = '\u{10FFFF}'

§Examples

pub constMAX_LEN_UTF8:usize = 4

pub constMAX_LEN_UTF16:usize = 2

pub constREPLACEMENT_CHARACTER:char = '\u{FFFD}'

pub constUNICODE_VERSION: (u8,u8,u8) = crate::unicode::UNICODE_VERSION

pub fndecode_utf16<I>(iter: I) ->DecodeUtf16<<I asIntoIterator>::IntoIter>ⓘwhere I:IntoIterator<Item =u16>,

§Examples

pub const fnfrom_u32(i:u32) ->Option<char>

§Examples

pub const unsafe fnfrom_u32_unchecked(i:u32) ->char

§Safety

§Examples

pub const fnfrom_digit(num:u32, radix:u32) ->Option<char>

§Panics

§Examples

pub const fnis_digit(self, radix:u32) ->bool

§Panics

§Examples

pub const fnto_digit(self, radix:u32) ->Option<u32>

§Errors

§Panics

§Examples

pub fnescape_unicode(self) ->EscapeUnicodeⓘ

§Examples

pub fnescape_debug(self) ->EscapeDebugⓘ

§Examples

pub fnescape_default(self) ->EscapeDefaultⓘ

§Examples

pub const fnlen_utf8(self) ->usize

§Examples

pub const fnlen_utf16(self) ->usize

§Examples

pub const fnencode_utf8(self, dst: &mut [u8]) -> &mutstr

§Panics

§Examples

pub const fnencode_utf16(self, dst: &mut [u16]) -> &mut [u16]

§Panics

§Examples

pub fnis_alphabetic(self) ->bool

§Examples

pub const fnis_lowercase(self) ->bool

§Examples

pub const fnis_uppercase(self) ->bool

§Examples

pub const fnis_whitespace(self) ->bool

§Examples

pub fnis_alphanumeric(self) ->bool

§Examples

pub fnis_control(self) ->bool

§Examples

pub fnis_numeric(self) ->bool

§Examples

pub fnto_lowercase(self) ->ToLowercaseⓘ

§Examples

pub fnto_uppercase(self) ->ToUppercaseⓘ

§Examples

§Note on locale

pub const fnis_ascii(&self) ->bool

§Examples

pub const fnas_ascii(&self) ->Option<AsciiChar>

pub const unsafe fnas_ascii_unchecked(&self) ->AsciiChar

§Safety

pub const fnto_ascii_uppercase(&self) ->char

§Examples

pub const fnto_ascii_lowercase(&self) ->char

§Examples

pub const fneq_ignore_ascii_case(&self, other: &char) ->bool

§Examples

pub const fnmake_ascii_uppercase(&mut self)

§Examples

pub const fnmake_ascii_lowercase(&mut self)

Primitive Typechar

pub fndecode_utf16<I>(iter: I) ->DecodeUtf16<<I asIntoIterator>::IntoIter>ⓘ
where I:IntoIterator<Item =u16>,

pub fnescape_unicode(self) ->EscapeUnicode ⓘ

pub fnescape_debug(self) ->EscapeDebug ⓘ

pub fnescape_default(self) ->EscapeDefault ⓘ

pub fnto_lowercase(self) ->ToLowercase ⓘ

pub fnto_uppercase(self) ->ToUppercase ⓘ

fnextend<I>(&mut self, iter: I)
where I:IntoIterator<Item = &'achar>,

fnextend<I>(&mut self, iter: I)
where I:IntoIterator<Item =char>,