Localization library | |||||||||||||||||||||||||
Regular expressions library(C++11) | |||||||||||||||||||||||||
Formatting library(C++20) | |||||||||||||||||||||||||
Null-terminated sequence utilities | |||||||||||||||||||||||||
Byte strings | |||||||||||||||||||||||||
Multibyte strings | |||||||||||||||||||||||||
Wide strings | |||||||||||||||||||||||||
Primitive numeric conversions | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
Text encoding identifications | |||||||||||||||||||||||||
|
Defined in header <cstdlib> | ||
int mblen(constchar* s,std::size_t n); | ||
Determines the size, in bytes, of the multibyte character whose first byte is pointed to bys.
Ifs is a null pointer, resets the global conversion state and determines whether shift sequences are used.
This function is equivalent to the callstd::mbtowc(nullptr, s, n), except that conversion state ofstd::mbtowc is unaffected.
Contents |
Each call tomblen
updates the internal global conversion state (a static object of typestd::mbstate_t, only known to this function). If the multibyte encoding uses shift states, care must be taken to avoid backtracking or multiple scans. In any case, multiple threads should not callmblen
without synchronization:std::mbrlen may be used instead.
s | - | pointer to the multibyte character |
n | - | limit on the number of bytes in s that can be examined |
Ifs is not a null pointer, returns the number of bytes that are contained in the multibyte character or-1 if the first bytes pointed to bys do not form a valid multibyte character or0 ifs is pointing at the null character'\0'.
Ifs is a null pointer, resets its internal conversion state to represent the initial shift state and returns0 if the current multibyte encoding is not state-dependent (does not use shift sequences) or a non-zero value if the current multibyte encoding is state-dependent (uses shift sequences).
#include <clocale>#include <cstdlib>#include <iomanip>#include <iostream>#include <stdexcept>#include <string_view> // the number of characters in a multibyte string is the sum of mblen()'s// note: the simpler approach is std::mbstowcs(nullptr, s.c_str(), s.size())std::size_t strlen_mb(conststd::string_view s){ std::mblen(nullptr,0);// reset the conversion statestd::size_t result=0;constchar* ptr= s.data();for(constchar*const end= ptr+ s.size(); ptr< end;++result){constint next= std::mblen(ptr, end- ptr);if(next==-1)throwstd::runtime_error("strlen_mb(): conversion error"); ptr+= next;}return result;} void dump_bytes(conststd::string_view str){std::cout<<std::hex<<std::uppercase<<std::setfill('0');for(unsignedchar c: str)std::cout<<std::setw(2)<<static_cast<int>(c)<<' ';std::cout<<std::dec<<'\n';} int main(){// allow mblen() to work with UTF-8 multibyte encodingstd::setlocale(LC_ALL,"en_US.utf8");// UTF-8 narrow multibyte encodingconststd::string_view str="z\u00df\u6c34\U0001f34c";// or u8"zß水🍌"std::cout<<std::quoted(str)<<" is "<< strlen_mb(str)<<" characters, but as much as "<< str.size()<<" bytes: "; dump_bytes(str);}
Possible output:
"zß水🍌" is 4 characters, but as much as 10 bytes: 7A C3 9F E6 B0 B4 F0 9F 8D 8C
converts the next multibyte character to wide character (function)[edit] | |
returns the number of bytes in the next multibyte character, given state (function)[edit] | |
C documentation formblen |