Movatterモバイル変換


[0]ホーム

URL:


cppreference.com
Namespaces
Variants
    Actions

      Character sets and encodings

      From cppreference.com
      <cpp‎ |language
       
       
      C++ language
      General topics
      Flow control
      Conditional execution statements
      Iteration statements (loops)
      Jump statements
      Functions
      Function declaration
      Lambda function expression
      inline specifier
      Dynamic exception specifications(until C++17*)
      noexcept specifier(C++11)
      Exceptions
      Namespaces
      Types
      Specifiers
      constexpr(C++11)
      consteval(C++20)
      constinit(C++20)
      Storage duration specifiers
      Initialization
      Expressions
      Alternative representations
      Literals
      Boolean -Integer -Floating-point
      Character -String -nullptr(C++11)
      User-defined(C++11)
      Utilities
      Attributes(C++11)
      Types
      typedef declaration
      Type alias declaration(C++11)
      Casts
      Memory allocation
      Classes
      Class-specific function properties
      Special member functions
      Templates
      Miscellaneous
       
       

      This page describes several character sets specified by the C++ standard.

      Contents

      Translation character set

      Thetranslation character set consists of the following elements:

      • each abstract character assigned a code point in theUnicode codespace, and
      • a distinct character for each Unicode scalar value not assigned to an abstract character.

      The translation character set is a superset of the basic character set and the basic literal character set (see below).

      (since C++23)

      [edit]Basic character set

      Thebasic character set consists of the following96(until C++26)99(since C++26) characters:

      Code pointCharacterGlyph
      U+0009Character tabulation
      U+000BLine tabulation
      U+000CForm feed (FF)
      U+0020Space
      U+000ALine feed (LF)new-line
      U+0021Exclamation mark!
      U+0022Quotation mark"
      U+0023Number sign#
      U+0025Percent sign%
      U+0026Ampersand&
      U+0027Apostrophe'
      U+0028Left parenthesis(
      U+0029Right parenthesis)
      U+002AAsterisk*
      U+002BPlus sign+
      U+002CComma,
      U+002DHyphen-minus-
      U+002EFull stop.
      U+002FSolidus/
      U+0030 .. U+0039Digit zero .. nine0 1 2 3 4 5 6 7 8 9
      U+003AColon:
      U+003BSemicolon;
      U+003CLess-than sign<
      U+003DEquals sign=
      U+003EGreater-than sign>
      U+003FQuestion mark?
      U+0041 .. U+005ALatin capital letter A .. ZA B C D E F G H I J K L M

      N O P Q R S T U V W X Y Z

      U+005BLeft square bracket[
      U+005CReverse solidus\
      U+005DRight square bracket]
      U+005ECircumflex accent^
      U+005FLow line_
      U+0061 .. U+007ALatin small letter a .. za b c d e f g h i j k l m

      n o p q r s t u v w x y z

      U+007BLeft curly bracket{
      U+007CVertical line|
      U+007DRight curly bracket}
      U+007ETilde~

      The following characters are added to the basic character set since C++26:

      Code pointCharacterGlyph
      U+0024Dollar Sign$
      U+0040Commercial At@
      U+0060Grave Accent`
      (since C++26)

      [edit]Basic literal character set

      Thebasic literal character set consists of all characters of the basic character set, plus the following control characters:

      Code pointCharacter
      U+0000Null
      U+0007Bell
      U+0008Backspace
      U+000DCarriage return (CR)

      [edit]Execution character set

      The execution character set and the execution wide-character set are supersets of the basic literalcharacter set. The encodings of the execution character sets and the sets of additional elements(if any) are locale-specific. Each element of execution wide-character set must be representable as a distinctwchar_t code unit.

      [edit]Code unit and literal encoding

      Acode unit is an integer value of character type. Characters in acharacter literal other than a multicharacter or non-encodable character literal or in astring literal are encoded as a sequence of one or more code units, as determined by the encoding prefix; this is termed the respectiveliteral encoding.

      A literal encoding or a locale-specific encoding of one of the execution character sets encodeseach element of the basic literal character set as a single code unit with non-negative value, distinct from the code unit for any other such element. A character not in the basic literal character set can be encoded with more than one code unit; the value of such a code unit can be the same as that of a code unit for an element of the basic literal character set. The encodings of the execution character sets can be unrelated to any literal encoding.

      The ordinary literal encoding is the encoding applied to an ordinary character or string literal. The wide literal encoding is the encoding applied to a wide character or string literal.

      The U+0000 NULL character is encoded as the value 0. No other element of the translation character set is encoded with a code unit of value 0. The code unit value of each decimal digit character after the digit 0 (U+0030) shall be one greater than the value of the previous. The ordinary and wide literal encodings are otherwise implementation-defined.

      For a UTF-8, UTF-16, or UTF-32 literal, the UCS scalar value corresponding to each character of the translation character set is encoded as specified in ISO/IEC 10646 for the respective UCS encoding form.

      [edit]Notes

      The standard names of some character sets are changed in C++23 viaP2314R4.

      New name(s)Old name(s)
      basic character setbasic source character set
      basic literal character setbasic execution character set
      basic execution wide-character set

      Mapping from source file(other than a UTF-8 source file)(since C++23) characters to thebasic character set(until C++23)translation character set(since C++23) duringtranslation phase 1 is implementation-defined, so an implementation is required to document how the basic source characters are represented in source files.

      [edit]Defect reports

      The following behavior-changing defect reports were applied retroactively to previously published C++ standards.

      DRApplied toBehavior as publishedCorrect behavior
      CWG 788C++98the values of the members of the execution character sets
      were implementation-defined, but were not locale-specific
      they are locale-specific
      CWG 1796C++98the representation of the null (wide) character in
      basic execution (wide-)character set had all zero bits
      only required value to be zero

      [edit]See also

      ASCII chart
      C documentation forCharacter sets and encodings
      Retrieved from "https://en.cppreference.com/mwiki/index.php?title=cpp/language/charset&oldid=182788"

      [8]ページ先頭

      ©2009-2025 Movatter.jp