Movatterモバイル変換

[0]ホーム

This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 119a. See http://www.open-std.org/jtc1/sc22/wg21/ for the official list.

2025-12-20

411. Use of universal-character-name in character versus string literals

Section:5.13.5 [lex.string] Status:CD6 Submitter:James Kanze Date:23 Apr 2003

[Accepted at the November, 2020 meeting as part of paper P2029R4.]

5.13.5 [lex.string] paragraph 5 reads

Escape sequences anduniversal-character-names in string literals have the same meaning as incharacter literals, except that the single quote ' is representableeither by itself or by the escape sequence \', and the double quote "shall be preceded by a \. In a narrow string literal, auniversal-character-name may map to more than one char element due tomultibyte encoding.

The first sentence refers us to 5.13.3 [lex.ccon],where we read in thefirst paragraph that "An ordinary character literal that contains asingle c-char has type char [...]." Since the grammar shows that auniversal-character-name is a c-char, something like '\u1234' must havetype char (and thus be a single char element); in paragraph 5, we readthat "A universal-character-name is translated to the encoding, in theexecution character set, of the character named. If there is no suchencoding, the universal-character-name is translated to animplemenation-defined encoding."

This is in obvious contradiction with the second sentence. In addition,I'm not really clear what is supposed to happen in the case where theexecution (narrow-)character set is UTF-8. Consider the character\u0153 (the oe in the French word oeuvre). Should '\u0153' be a char,with an "error" value, say '?' (in conformance with the requirement thatit be a single char), or an int, with the two char values 0xC5, 0x93, inan implementation defined order (in conformance with the requirementthat a character representable in the execution character set berepresented). Supposing the former, should "\u0153" be the equivalent of"?" (in conformance with the first sentence), or "\xC5\x93" (inconformance with the second).

Notes from October 2003 meeting:

We decided we should forward this to the C committee and let themresolve it. Sent via e-mail to John Benito on November 14, 2003.

Reply from John Benito:

I talked this over with the C project editor, we believe this washandled by the C committee before publication of the current standard.
WG14 decided there needed to be a more restrictive rulefor one-to-one mappings: rather than saying "a single c-char"as C++ does, the C standard says "a single character thatmaps to a single-byte execution character"; WG14 fully expectsome (if not many or even most) UCNs to map to multiple characters.
Because of the fundamental differences between C and C++ charactertypes, I am not sure the C committee is qualified to answer thissatisfactorily for WG21. WG14 is willing to review any decision reachedfor compatibility.
I hope this helps.

(See alsoissue 912 for a relatedquestion.)

[8]ページ先頭