This is an unofficial snapshot of the ISO/IEC JTC1 SC22 WG21 Core Issues List revision 119a. See http://www.open-std.org/jtc1/sc22/wg21/ for the official list.
2025-12-20
[Voted into WP at March, 2010 meeting.]
There are several instances of undefined behavior in lexicalprocessing:
5.2 [lex.phases] paragraph 1, phase 2: auniversal-character-name resulting from a line splice.
5.2 [lex.phases] paragraph 1, phase 2: a file endingwithout a new-line character or with a new-line character that is splicedaway.
5.2 [lex.phases] paragraph 1, phase 4: auniversal-character-name resulting from macro token concatenation.
5.6 [lex.header] paragraph 2:',\,/*,//, or" appearing in aheader-name.
These would be more appropriately handled as conditionally-supportedbehavior, requiring implementations either to document their handlingof these constructs or to issue a diagnostic.
Additional note, March, 2009:
The undefined behavior referred to above regardinguniversal-character-names is the result of the considerationsdescribed inthe C99 Rationale, section 5.2.1, in the part entitled “UCNmodels.” Three different models for support of UCNs aredescribed, each involving different conversions between UCNs and widecharacters and/or at different times during program translation.Implementations, as well as the specification in a language standard,can employ any of the three, but it must be impossible for awell-defined program to determine which model was actually employed byimplementation. The implication of this “equivalenceprinciple” is that any construct that would give differentresults under the different models must be classified as undefinedbehavior. For example, an apparent UCN resulting from a line-splicewould be recognized as a UCN by an implementation in which all widecharacters were translated immediately into UCNs, as described in C++phase 1, but would not be recognized as a UCN by anotherimplementation in which all UCNs were translated immediately into widecharacters (a possibility mentioned parenthetically in C++ phase1).
There are additional implications for this “equivalenceprinciple” beyond the ones identified in the UK CD comments.See alsoissue 578; presumably a stringlike the one in that issue should also be described as havingundefined behavior. Also, because C++'s model introduces backslashcharacters as part of UCNs for any character outside the basic sourcecharacter set, anyheader-name that contains such a character(e.g.,#include "@.h") will have undefined behavior in C++.This is also the reason that UCNs are translated into wide charactersinside raw strings: two of the three models articulated in the C99Rationale translate to or from UCNs in phase 1, before raw stringsare recognized as tokens in phase 3, so raw strings cannot treat UCNsdifferently from the way they are treated in other contexts. See alsoissue 789 forsimilar points regarding trigraphs.
Notes from the October, 2009 meeting:
The CWG decided that the non-UCN aspects of this issue should beresolved, while the overall questions regarding trigraphs, UCNs, andraw strings will be investigated separately.
Proposed resolution (February, 2010):
Change 5.2 [lex.phases] paragraph 1 phase 2 asfollows:
...If aA source file that is not emptyandthat does not end in a new-line character, orthatends in a new-line character immediately preceded by a backslashcharacter before any such splicing takes place,the behavior isundefinedshall be processed as if an additional new-linecharacter were appended to the file.
Change 5.6 [lex.header] paragraph 2 as follows:
IfThe appearance of either of the characters' or\,orof either of thecharacter sequences/* or//appearsin aq-char-sequence oraanh-char-sequenceis conditionally-supported withimplementation-defined semantics,oras isthe appearance of the character"appearsinaanh-char-sequence, the behavior is undefined.[Footnote: Thus,a sequencesofcharacters that resembles an escape sequences causeundefined behaviormight result in an error, be interpretedas the character corresponding to the escape sequence, or have acompletely different meaning, depending on theimplementation. —end footnote]