Movatterモバイル変換

[0]ホーム

This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 119a. See http://www.open-std.org/jtc1/sc22/wg21/ for the official list.

2025-12-20

1103. Reversion of phase 1 and 2 transformations in raw string literals

Section:5.2 [lex.phases] Status:C++11 Submitter:US Date:2010-08-02

[Voted into the WP at the November, 2010 meeting.]

N3092 comment US 13
N3092 comment US 14

“Raw” strings are still only Pittsburgh-rarestrings: the reversion in phase 3 only applies to anr-char-sequence. It should apply to the entire rawstring literal.

Proposed resolution (August, 2010):

Change 5.2 [lex.phases] paragraph 1 phase 1as follows:

...(An implementation may use any internal encoding, so longas an actual extended character encountered in the sourcefile, and the same extended character expressed in thesource file as a universal-character-name (i.e., using the\uXXXX notation), are handled equivalentlyexcept where this replacement is reverted in a rawstring literal.).)

Change 5.2 [lex.phases] paragraph 1 phase 3as follows:

...[Example: see the handling of< withina#include preprocessing directive. —endexample]~~Within the r-char-sequence of a raw stringliteral, any transformations performed in phases 1 and 2(trigraphs, universal-character-names, and line splicing)are reverted.~~

Change 5.2 [lex.phases] paragraph 1 phase 5as follows:

Each source character set member~~anduniversal-character-name~~ in a character literal or astring literal, as well as each escape sequenceanduniversal-character-name in a character literal or anon-raw string literal, is converted to the correspondingmember of the execution character set (5.13.3 [lex.ccon], 5.13.5 [lex.string]); if there is nocorresponding member, it is converted to animplementation-defined member other than the null (wide)character.

Change 5.3.1 [lex.charset] paragraph 2 as follows:

...Additionally, if the hexadecimal value for auniversal-character-name outside thec-char-sequence,s-char-sequence, orr-char-sequence of a character or stringliteral corresponds to a control character (in either of theranges 0x000x1F or 0x7F0x9F, both inclusive) or to acharacter in the basic source character set, the program isill-formed.[Footnote: A sequence of charactersresembling a universal-character-name in anr-char-sequence(5.13.5 [lex.string]) does not form auniversal-character-name. —end footnote]

Change 5.5 [lex.pptoken] paragraph 3 as follows:

If the input stream has been parsed into preprocessingtokens up to a given character:
ifIf the next characterbegins a sequence of characters that could be the prefix andinitial double quote of a raw string literal, such asR", the next preprocessing token shall be a rawstring literal;. Between the initial andfinal double quote characters of the raw string, anytransformations performed in phases 1 and 2 (trigraphs,universal-character-names, and line splicing) are reverted;this reversion shall apply before anyd-char,r-char, or delimiting parenthesis isidentified. The raw string literal is defined as theshortest sequence of characters that matches theraw-string pattern
encoding-prefix_optRraw-string
~~otherwise~~Otherwise, the nextpreprocessing token is the longest sequence of charactersthat could constitute a preprocessing token, even if thatwould cause further lexical analysis to fail.

Delete footnote 24 in 5.13.5 [lex.string] paragraph 2:

~~Use of characters with trigraph equivalents in ad-char-sequence may produce unintended results.~~

Insert the following examples after 5.13.5 [lex.string] paragraph 4:

[Example: The raw string
  R"a(  )\  a"  )a"
is equivalent to"\n)\\\na\"\n". The raw string
  R"(??)"
is equivalent to"\?\?". The raw string
  R"#(  )??="  )#"
is equivalent to"\n)\?\?=\"\n". —endexample]

[8]ページ先頭