Movatterモバイル変換


[0]ホーム

URL:


cppreference.com
Namespaces
Variants
    Actions

      std::codecvt_utf8

      From cppreference.com
      <cpp‎ |locale
       
       
       
      Localization library
       
      Defined in header<codecvt>
      template<

         class Elem,
         unsignedlong Maxcode=0x10ffff,
         std::codecvt_mode Mode=(std::codecvt_mode)0>
      class codecvt_utf8

         :publicstd::codecvt<Elem,char,std::mbstate_t>;
      (since C++11)
      (deprecated in C++17)
      (removed in C++26)

      std::codecvt_utf8 is astd::codecvt facet which encapsulates conversion between a UTF-8 encoded byte string and UCS-2 or UTF-32 character string (depending on the type ofElem). Thisstd::codecvt facet can be used to read and write UTF-8 files, both text and binary.

      UCS-2 is an archaic encoding that is a subset of UTF-16, which encodes scalar values in the range U+0000-U+FFFF (Basic Multilingual Plane) only.

      Contents

      [edit]Template Parameters

      Elem - eitherchar16_t,char32_t, orwchar_t
      Maxcode - the largest value ofElem that this facet will read or write without error
      Mode - a constant of typestd::codecvt_mode

      [edit]Member functions

      (constructor)
      constructs a newcodecvt_utf8 facet
      (public member function)
      (destructor)
      destroys acodecvt_utf8 facet
      (public member function)

      std::codecvt_utf8::codecvt_utf8

      explicit codecvt_utf8(std::size_t refs=0);

      Constructs a newstd::codecvt_utf8 facet, passes the initial reference counterrefs to the base class.

      Parameters

      refs - the number of references that link to the facet

      std::codecvt_utf8::~codecvt_utf8

      ~codecvt_utf8();

      Destroys the facet. Unlike the locale-managed facets, this facet's destructor is public.

      Inherited fromstd::codecvt

      Nested types

      Type Definition
      intern_typeinternT
      extern_typeexternT
      state_typestateT

      [edit]Data members

      Member Description
      std::locale::idid[static] the identifier of thefacet

      Member functions

      invokesdo_out
      (public member function ofstd::codecvt<InternT,ExternT,StateT>)[edit]
      invokesdo_in
      (public member function ofstd::codecvt<InternT,ExternT,StateT>)[edit]
      invokesdo_unshift
      (public member function ofstd::codecvt<InternT,ExternT,StateT>)[edit]
      invokesdo_encoding
      (public member function ofstd::codecvt<InternT,ExternT,StateT>)[edit]
      invokesdo_always_noconv
      (public member function ofstd::codecvt<InternT,ExternT,StateT>)[edit]
      invokesdo_length
      (public member function ofstd::codecvt<InternT,ExternT,StateT>)[edit]
      invokesdo_max_length
      (public member function ofstd::codecvt<InternT,ExternT,StateT>)[edit]

      Protected member functions

      [virtual]
      converts a string fromInternT toExternT, such as when writing to file
      (virtual protected member function ofstd::codecvt<InternT,ExternT,StateT>)[edit]
      [virtual]
      converts a string fromExternT toInternT, such as when reading from file
      (virtual protected member function ofstd::codecvt<InternT,ExternT,StateT>)[edit]
      [virtual]
      generates the termination character sequence ofExternT characters for incomplete conversion
      (virtual protected member function ofstd::codecvt<InternT,ExternT,StateT>)[edit]
      [virtual]
      returns the number ofExternT characters necessary to produce oneInternT character, if constant
      (virtual protected member function ofstd::codecvt<InternT,ExternT,StateT>)[edit]
      tests if the facet encodes an identity conversion for all valid argument values
      (virtual protected member function ofstd::codecvt<InternT,ExternT,StateT>)[edit]
      [virtual]
      calculates the length of theExternT string that would be consumed by conversion into givenInternT buffer
      (virtual protected member function ofstd::codecvt<InternT,ExternT,StateT>)[edit]
      [virtual]
      returns the maximum number ofExternT characters that could be converted into a singleInternT character
      (virtual protected member function ofstd::codecvt<InternT,ExternT,StateT>)[edit]

      Inherited fromstd::codecvt_base

      Nested type Definition
      enum result{ ok, partial, error, noconv}; Unscoped enumeration type
      Enumeration constant Definition
      ok conversion was completed with no error
      partial not all source characters were converted
      error encountered an invalid character
      noconv no conversion required, input and output types are the same

      [edit]Notes

      Although the standard requires that this facet works with UCS-2 when the size ofElem is 16 bits, some implementations use UTF-16 instead. The term "UCS-2" was deprecated and removed from ISO 10646.

      [edit]Example

      The following example demonstrates the difference between UCS-2/UTF-8 and UTF-16/UTF-8 conversions: the third character in the string is not a valid UCS-2 character.

      Run this code
      #include <codecvt>#include <cstdint>#include <iostream>#include <locale>#include <string> int main(){// UTF-8 data. The character U+1d10b, musical sign segno, does not fit in UCS-2std::string utf8="z\u6c34\U0001d10b"; // the UTF-8 / UTF-16 standard conversion facetstd::wstring_convert<std::codecvt_utf8_utf16<char16_t>,char16_t> utf16conv;std::u16string utf16= utf16conv.from_bytes(utf8);std::cout<<"UTF-16 conversion produced "<< utf16.size()<<" code units:\n"<<std::showbase<<std::hex;for(char16_t c: utf16)std::cout<<static_cast<std::uint16_t>(c)<<' '; // the UTF-8 / UCS-2 standard conversion facetstd::wstring_convert<std::codecvt_utf8<char16_t>,char16_t> ucs2conv;try{std::u16string ucs2= ucs2conv.from_bytes(utf8);}catch(conststd::range_error& e){std::u16string ucs2= ucs2conv.from_bytes(utf8.substr(0, ucs2conv.converted()));std::cout<<"\nUCS-2 failed after producing "<<std::dec<< ucs2.size()<<" characters:\n"<<std::showbase<<std::hex;for(char16_t c: ucs2)std::cout<<static_cast<std::uint16_t>(c)<<' ';std::cout<<'\n';}}

      Output:

      UTF-16 conversion produced 4 code units:0x7a 0x6c34 0xd834 0xdd0bUCS-2 failed after producing 2 characters:0x7a 0x6c34

      [edit]Defect reports

      The following behavior-changing defect reports were applied retroactively to previously published C++ standards.

      DRApplied toBehavior as publishedCorrect behavior
      LWG 2229C++98the constructor and destructor were not specifiedspecifies them

      [edit]See also

      Character
      conversions
      locale-defined multibyte
      (UTF-8, GB18030)
      UTF-8
      UTF-16
      UTF-16mbrtoc16 /c16rtomb(with C11's DR488)

      codecvt<char16_t,char,mbstate_t>
      codecvt_utf8_utf16<char16_t>
      codecvt_utf8_utf16<char32_t>
      codecvt_utf8_utf16<wchar_t>

      N/A
      UCS-2c16rtomb(without C11's DR488)codecvt_utf8<char16_t>codecvt_utf16<char16_t>
      UTF-32

      mbrtoc32 /c32rtomb

      codecvt<char32_t,char,mbstate_t>
      codecvt_utf8<char32_t>

      codecvt_utf16<char32_t>

      systemwchar_t:

      UTF-32(non-Windows)
      UCS-2(Windows)

      mbsrtowcs /wcsrtombs
      use_facet<codecvt
      <wchar_t,char,mbstate_t>>(locale)

      codecvt_utf8<wchar_t>codecvt_utf16<wchar_t>
      converts between character encodings, including UTF-8, UTF-16, UTF-32
      (class template)[edit]
      (C++11)(deprecated in C++17)(removed in C++26)
      tags to alter behavior of the standard codecvt facets
      (enum)[edit]
      (C++11)(deprecated in C++17)(removed in C++26)
      converts between UTF-16 and UCS-2/UCS-4
      (class template)[edit]
      (C++11)(deprecated in C++17)(removed in C++26)
      converts between UTF-8 and UTF-16
      (class template)[edit]
      Retrieved from "https://en.cppreference.com/mwiki/index.php?title=cpp/locale/codecvt_utf8&oldid=177705"

      [8]ページ先頭

      ©2009-2025 Movatter.jp