Localization library | |||||||||||||||||||||||||
Regular expressions library(C++11) | |||||||||||||||||||||||||
Formatting library(C++20) | |||||||||||||||||||||||||
Null-terminated sequence utilities | |||||||||||||||||||||||||
Byte strings | |||||||||||||||||||||||||
Multibyte strings | |||||||||||||||||||||||||
Wide strings | |||||||||||||||||||||||||
Primitive numeric conversions | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
Text encoding identifications | |||||||||||||||||||||||||
|
Defined in header <cuchar> | ||
std::size_t mbrtoc32(char32_t* pc32, constchar* s, | (since C++11) | |
Converts a narrow multibyte character to its UTF-32 character representation.
Ifs is not a null pointer, inspects at mostn bytes of the multibyte character string, beginning with the byte pointed to bys to determine the number of bytes necessary to complete the next multibyte character (including any shift sequences). If the function determines that the next multibyte character ins is complete and valid, converts it to the corresponding 32-bit character and stores it in*pc32 (ifpc32 is not null).
If the multibyte character in*s corresponds to a multi-char32_t sequence (not possible with UTF-32), then after the first call to this function,*ps is updated in such a way that the next calls tombrtoc32
will write out the additionalchar32_t, without considering*s.
Ifs is a null pointer, the values ofn andpc32 are ignored and the call is equivalent tostd::mbrtoc32(nullptr,"",1, ps).
If the wide character produced is the null character, the conversion state*ps represents the initial shift state.
The multibyte encoding used by this function is specified by the currently active C locale.
Contents |
pc32 | - | pointer to the location where the resulting 32-bit character will be written |
s | - | pointer to the multibyte character string used as input |
n | - | limit on the number of bytes in s that can be examined |
ps | - | pointer to the conversion state object used when interpreting the multibyte string |
The first of the following that applies:
#include <cassert>#include <clocale>#include <cstring>#include <cuchar>#include <cwchar>#include <iomanip>#include <iostream> int main(){std::setlocale(LC_ALL,"en_US.utf8"); std::string str="z\u00df\u6c34\U0001F34C";// or u8"zß水🍌" std::cout<<"Processing "<< str.size()<<" bytes: [ "<<std::showbase;for(unsignedchar c: str)std::cout<<std::hex<<+c<<' ';std::cout<<"]\n"; std::mbstate_t state{};// zero-initialized to initial statechar32_t c32;constchar* ptr= str.c_str(),*end= str.c_str()+ str.size()+1; while(std::size_t rc= std::mbrtoc32(&c32, ptr, end- ptr,&state)){std::cout<<"Next UTF-32 char: "<<std::hex<<static_cast<int>(c32)<<" obtained from ";assert(rc!=(std::size_t)-3);// no surrogates in UTF-32if(rc==(std::size_t)-1)break;if(rc==(std::size_t)-2)break;std::cout<<std::dec<< rc<<" bytes [ ";for(std::size_t n=0; n< rc;++n)std::cout<<std::hex<<+static_cast<unsignedchar>(ptr[n])<<' ';std::cout<<"]\n"; ptr+= rc;}}
Output:
Processing 10 bytes: [ 0x7a 0xc3 0x9f 0xe6 0xb0 0xb4 0xf0 0x9f 0x8d 0x8c ]Next UTF-32 char: 0x7a obtained from 1 bytes [ 0x7a ]Next UTF-32 char: 0xdf obtained from 2 bytes [ 0xc3 0x9f ]Next UTF-32 char: 0x6c34 obtained from 3 bytes [ 0xe6 0xb0 0xb4 ]Next UTF-32 char: 0x1f34c obtained from 4 bytes [ 0xf0 0x9f 0x8d 0x8c ]
(C++11) | converts a UTF-32 character to narrow multibyte encoding (function)[edit] |
[virtual] | converts a string fromExternT toInternT , such as when reading from file(virtual protected member function of std::codecvt<InternT,ExternT,StateT> )[edit] |
C documentation formbrtoc32 |