| Localization library | |||||||||||||||||||||||||
| Regular expressions library(C++11) | |||||||||||||||||||||||||
| Formatting library(C++20) | |||||||||||||||||||||||||
| Null-terminated sequence utilities | |||||||||||||||||||||||||
| Byte strings | |||||||||||||||||||||||||
| Multibyte strings | |||||||||||||||||||||||||
| Wide strings | |||||||||||||||||||||||||
| Primitive numeric conversions | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
| Text encoding identifications | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Defined in header <codecvt> | ||
template< class Elem, | (since C++11) (deprecated in C++17) (removed in C++26) | |
std::codecvt_utf8_utf16 is astd::codecvt facet which encapsulates conversion between a UTF-8 encoded byte string and UTF-16 encoded character string. IfElem is a 32-bit type, one UTF-16 code unit will be stored in each 32-bit character of the output sequence.
This is an N:M conversion facet, and cannot be used withstd::basic_filebuf (which only permits 1:N conversions, such as UTF-32/UTF-8, between the internal and the external encodings). This facet can be used withstd::wstring_convert.
Contents |
| Elem | - | eitherchar16_t,char32_t, orwchar_t |
| Maxcode | - | the largest value ofElem that this facet will read or write without error |
| Mode | - | a constant of typestd::codecvt_mode |
(constructor) | constructs a newcodecvt_utf8_utf16 facet(public member function) |
(destructor) | destroys acodecvt_utf8_utf16 facet(public member function) |
explicit codecvt_utf8_utf16(std::size_t refs=0); | ||
Constructs a newstd::codecvt_utf8_utf16 facet, passes the initial reference counterrefs to the base class.
| refs | - | the number of references that link to the facet |
~codecvt_utf8_utf16(); | ||
Destroys the facet. Unlike the locale-managed facets, this facet's destructor is public.
| Type | Definition |
intern_type | internT |
extern_type | externT |
state_type | stateT |
| Member | Description |
std::locale::idid[static] | the identifier of thefacet |
invokesdo_out(public member function of std::codecvt<InternT,ExternT,StateT>)[edit] | |
invokesdo_in(public member function of std::codecvt<InternT,ExternT,StateT>)[edit] | |
invokesdo_unshift(public member function of std::codecvt<InternT,ExternT,StateT>)[edit] | |
invokesdo_encoding(public member function of std::codecvt<InternT,ExternT,StateT>)[edit] | |
invokesdo_always_noconv(public member function of std::codecvt<InternT,ExternT,StateT>)[edit] | |
invokesdo_length(public member function of std::codecvt<InternT,ExternT,StateT>)[edit] | |
invokesdo_max_length(public member function of std::codecvt<InternT,ExternT,StateT>)[edit] |
[virtual] | converts a string fromInternT toExternT, such as when writing to file(virtual protected member function of std::codecvt<InternT,ExternT,StateT>)[edit] |
[virtual] | converts a string fromExternT toInternT, such as when reading from file(virtual protected member function of std::codecvt<InternT,ExternT,StateT>)[edit] |
[virtual] | generates the termination character sequence ofExternT characters for incomplete conversion(virtual protected member function of std::codecvt<InternT,ExternT,StateT>)[edit] |
[virtual] | returns the number ofExternT characters necessary to produce oneInternT character, if constant(virtual protected member function of std::codecvt<InternT,ExternT,StateT>)[edit] |
[virtual] | tests if the facet encodes an identity conversion for all valid argument values (virtual protected member function of std::codecvt<InternT,ExternT,StateT>)[edit] |
[virtual] | calculates the length of theExternT string that would be consumed by conversion into givenInternT buffer(virtual protected member function of std::codecvt<InternT,ExternT,StateT>)[edit] |
[virtual] | returns the maximum number ofExternT characters that could be converted into a singleInternT character(virtual protected member function of std::codecvt<InternT,ExternT,StateT>)[edit] |
| Nested type | Definition |
| enum result{ ok, partial, error, noconv}; | Unscoped enumeration type |
| Enumeration constant | Definition |
ok | conversion was completed with no error |
partial | not all source characters were converted |
error | encountered an invalid character |
noconv | no conversion required, input and output types are the same |
#include <cassert>#include <codecvt>#include <cstdint>#include <iostream>#include <locale>#include <string> int main(){std::string u8="z\u00df\u6c34\U0001f34c";std::u16string u16= u"z\u00df\u6c34\U0001f34c"; // UTF-8 to UTF-16/char16_tstd::u16string u16_conv=std::wstring_convert< std::codecvt_utf8_utf16<char16_t>,char16_t>{}.from_bytes(u8);assert(u16== u16_conv);std::cout<<"UTF-8 to UTF-16 conversion produced "<< u16_conv.size()<<" code units:\n"<<std::showbase<<std::hex;for(char16_t c: u16_conv)std::cout<<static_cast<std::uint16_t>(c)<<' '; // UTF-16/char16_t to UTF-8std::string u8_conv=std::wstring_convert< std::codecvt_utf8_utf16<char16_t>,char16_t>{}.to_bytes(u16);assert(u8== u8_conv);std::cout<<"\nUTF-16 to UTF-8 conversion produced "<<std::dec<< u8_conv.size()<<" bytes:\n"<<std::hex;for(char c: u8_conv)std::cout<<+static_cast<unsignedchar>(c)<<' ';std::cout<<'\n';}
Output:
UTF-8 to UTF-16 conversion produced 5 code units:0x7a 0xdf 0x6c34 0xd83c 0xdf4cUTF-16 to UTF-8 conversion produced 10 bytes:0x7a 0xc3 0x9f 0xe6 0xb0 0xb4 0xf0 0x9f 0x8d 0x8c
The following behavior-changing defect reports were applied retroactively to previously published C++ standards.
| DR | Applied to | Behavior as published | Correct behavior |
|---|---|---|---|
| LWG 2229 | C++98 | the constructor and destructor were not specified | specifies them |
| Character conversions | locale-defined multibyte (UTF-8, GB18030) | UTF-8 | UTF-16 |
|---|---|---|---|
| UTF-16 | mbrtoc16 /c16rtomb(with C11's DR488) | codecvt<char16_t,char,mbstate_t> | N/A |
| UCS-2 | c16rtomb(without C11's DR488) | codecvt_utf8<char16_t> | codecvt_utf16<char16_t> |
| UTF-32 | codecvt<char32_t,char,mbstate_t> | codecvt_utf16<char32_t> | |
| systemwchar_t: UTF-32(non-Windows) | mbsrtowcs /wcsrtombs | codecvt_utf8<wchar_t> | codecvt_utf16<wchar_t> |
| converts between character encodings, including UTF-8, UTF-16, UTF-32 (class template)[edit] | |
(C++11)(deprecated in C++17)(removed in C++26) | tags to alter behavior of the standard codecvt facets (enum)[edit] |
(C++11)(deprecated in C++17)(removed in C++26) | converts between UTF-8 and UCS-2/UCS-4 (class template)[edit] |
(C++11)(deprecated in C++17)(removed in C++26) | converts between UTF-16 and UCS-2/UCS-4 (class template)[edit] |