Movatterモバイル変換


[0]ホーム

URL:


cppreference.com
Namespaces
Variants
    Actions

      Character sets and encodings

      From cppreference.com
      <c‎ |language
       
       
       
       

      Contents

      [edit]Basic character set

      Thebasic character set consists of the following 95 characters:

      Code pointCharacterGlyph
      U+0009Character tabulation
      U+000BLine tabulation
      U+000CForm feed (FF)
      U+0020Space
      U+0021Exclamation mark!
      U+0022Quotation mark"
      U+0023Number sign#
      U+0025Percent sign%
      U+0026Ampersand&
      U+0027Apostrophe'
      U+0028Left parenthesis(
      U+0029Right parenthesis)
      U+002AAsterisk*
      U+002BPlus sign+
      U+002CComma,
      U+002DHyphen-minus-
      U+002EFull stop.
      U+002FSolidus/
      U+0030 .. U+0039Digit zero .. nine0 1 2 3 4 5 6 7 8 9
      U+003AColon:
      U+003BSemicolon;
      U+003CLess-than sign<
      U+003DEquals sign=
      U+003EGreater-than sign>
      U+003FQuestion mark?
      U+0041 .. U+005ALatin capital letter A .. ZA B C D E F G H I J K L M

      N O P Q R S T U V W X Y Z

      U+005BLeft square bracket[
      U+005CReverse solidus\
      U+005DRight square bracket]
      U+005ECircumflex accent^
      U+005FLow line_
      U+0061 .. U+007ALatin small letter a .. za b c d e f g h i j k l m

      n o p q r s t u v w x y z

      U+007BLeft curly bracket{
      U+007CVertical line|
      U+007DRight curly bracket}
      U+007ETilde~

      Unlike C++, the U+000A LINE FEED (LF) character is not included in basic character set. Instead, there shall be some way of indicating the end of each line of text in the source file and the document treats such an end-of-line indicator as if it were a single new-line character.

      Basic character set is also known asbasic source character set.

      [edit]Basic execution character set

      Thebasic execution character set contains all the members of the basic character set, plus the following characters:

      Code unit Character
      U+0000Null
      U+0007Bell
      U+0008Backspace
      U+000ALine feed (LF)
      U+000DCarriage return (CR)

      For each basic execution character set, the values of the members shall be non-negative and distinct from one another. In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous. The U+0000 NULL character has the value 0.

      The representation of each member of the basic execution character sets fit in a byte.

      In C++, basic execution character set is also known asbasic literal character set andbasic execution wide-character set.

      [edit]Literal encodings

      Theliteral encoding is an implementation-defined mapping of the characters of the execution character set to the values in acharacter constant orstring literal without encoding prefix. It supports a mapping from all the basic execution character set values into the implementation-defined encoding. It may contain multibyte character sequences.

      The following characters are not in basic execution character set, but they are required to be encoded as a single byte in an ordinary character constant or ordinary string literal.

      Code pointCharacterGlyph
      U+0024Dollar Sign$
      U+0040Commercial At@
      U+0060Grave Accent`
      (since C23)

      Thewide literal encoding is an implementation-defined mapping of the characters of the execution character set to the values in anL-prefixed character constant or string literal. It supports a mapping from all the basic execution character set values into the implementation-defined encoding. If an implementation does not define__STDC_MB_MIGHT_NEQ_WC__, the mapping produces values identical to the literal encoding for all the basic execution character set values. One or more values may map to one or more values of the extended execution character set.

      The UTF-8 encoding is used for mapping characters of the execution character set to au8-prefixedcharacter constant or(since C23) string literal.

      An implementation-defined encoding(until C23)The UTF-16 encoding(since C23) is used for mapping characters of the execution character set to au-prefixed character constant or string literal.

      An implementation-defined encoding(until C23)The UTF-32 encoding(since C23) is used for mapping characters of the execution character set to aU-prefixed character constant or string literal.

      (since C11)

      [edit]See also

      ASCII chart
      C++ documentation forCharacter sets and encodings
      Retrieved from "https://en.cppreference.com/mwiki/index.php?title=c/language/charset&oldid=151269"

      [8]ページ先頭

      ©2009-2025 Movatter.jp