Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

UTF-8/16/32 C++11 header only library for Windows / Linux / macOS

License

NotificationsYou must be signed in to change notification settings

ww898/utf-cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is the C++11 template based header only library under Windows/Linux/MacOs to convert UFT-8/16/32 symbols and strings. The library transparently supportwchar_t as UTF-16 for Windows and UTF-32 for Linux and MacOs.

UTF-8 and UTF-32 (UCS-32) both support 31 bit wide code points[0‥0x7FFFFFFF]with no restriction. UTF-16 supports only unicode code points[0‥0x10FFFF], where high[0xD800‥0xDBFF] and low[0xDC00‥0xDFFF] surrogate regions are prohibited.

The maximum UTF-16 symbol size is 2 words (4 bytes, both words should be in the surrogate region). UFT-32 (UCS-32) is always 1 word (4 bytes). UTF-8 has the maximum symbol size (seeconversion table for details):

  • 4 bytes for unicode code points
  • 6 bytes for 31bit code points
UTF-16 surrogate decoder:
High\LowDC00DC01DFFF
D8000100000100010103FF
D8010104000104010107FF
DBFF10FC0010FC0110FFFF

UTF-16 Surrogates

Supported compilers

Tested on following compilers:

Usage example

// यूनिकोडstaticcharconst u8s[] ="\xE0\xA4\xAF\xE0\xA5\x82\xE0\xA4\xA8\xE0\xA4\xBF\xE0\xA4\x95\xE0\xA5\x8B\xE0\xA4\xA1";usingnamespaceww898::utf;    std::u16string u16;    convz<utf_selector_t<decltype(*u8s)>, utf16>(u8s, std::back_inserter(u16));    std::u32string u32;    conv<utf16,utf_selector_t<decltype(u32)::value_type>>(u16.begin(), u16.end(), std::back_inserter(u32));    std::vector<char> u8;    convz<utf32, utf8>(u32.data(), std::back_inserter(u8));    std::wstring uw;    conv<utf8, utfw>(u8s, u8s +sizeof(u8s), std::back_inserter(uw));auto u8r = conv<char>(uw);auto u16r = conv<char16_t>(u16);auto uwr = convz<wchar_t>(u8s);auto u32r = conv<char32_t>(std::string_view(u8r.data(), u8r.size()));// C++17 onlystatic_assert(std::is_same<utf_selector<decltype(*u8s)>, utf_selector<decltype(u8)::value_type>>::value, "Fail");static_assert(        std::is_same<utf_selector_t<decltype(u16)::value_type>, utf_selector_t<decltype(uw)::value_type>>::value !=        std::is_same<utf_selector_t<decltype(u32)::value_type>, utf_selector_t<decltype(uw)::value_type>>::value, "Fail");

UTF-8 Conversion table

UTF-8/32 table


[8]ページ先頭

©2009-2025 Movatter.jp