ISO/IEC JTC1 SC22 WG21
N3981
Richard Smith
2014-05-06
The uses of trigraph-like constructs in one large codebase were examined.We discovered:
string pattern() const { return "foo-????\?-of-?????"; }
Trigraphs continue to pose a burden on users of C++.
Trigraphs are handled in the first phase of translation:
Physical source file characters are mapped, in an implementation-definedmanner, to the basic source character set (introducing new-line characters forend-of-line indicators) if necessary. The set of physical source filecharacters accepted is implementation-defined. Trigraph sequences (2.4) arereplaced by corresponding single-character internal representations.
Note that the mapping from physical source file characters to the basicsource character set is implementation-defined.If trigraphs are removed from the language entirely, an implementation thatwishes to support them can continue to do so: its implementation-definedmapping from physical source file characters to the basic source character setcan include trigraph translation (and can even avoid doing so within raw stringliterals).We do not need trigraphs in the standard for backwardscompatibility.
This paper proposes that trigraphs be removed entirely.
Change in 2.2 (lex.phases) paragraph 1 bullet 1:
Physical source file characters are mapped, in an implementation-definedmanner, to the basic source character set (introducing new-line characters forend-of-line indicators) if necessary. The set of physical source filecharacters accepted is implementation-defined.Trigraph sequences (2.4)are replaced by corresponding single-character internal representations.Any source file character not in the basic source character set (2.3) isreplaced by […]
Delete subclause 2.4 (lex.trigraph) "Trigraph sequences"
Change in 2.5 (lex.pptoken) paragraph 3 bullet 1:
If the next character begins a sequence of characters that could be the prefix and initial double quote ofa raw string literal, such asR", the next preprocessing token shall be a raw string literal. Between theinitial and final double quote characters of the raw string, any transformations performed in phases 1and 2 (trigraphs,universal-character-names,and line splicing) are reverted; this reversion shall applybefore any d-char, r-char, or delimiting parenthesis is identified. […]
Change footnote 24 in 2.14.3 (lex.ccon) paragraph 3:
Using an escape sequence for a question markcan avoid accidentallycreating a trigraphis supported for compatibility with ISOC++14 and ISO C.
Add a subclause to Annex C:
Clause 2: lexical conventions [diff.cpp14.lex]
Change: Removal of trigraphs.
Rationale: Undesirable feature that prevents some uses of?? innon-raw string literals and comments.
Effect on original feature:Valid C++2014 code that uses trigraphs may not be valid or may have differentsemantics. Implementations may choose to provide trigraphs as part of theimplementation-defined mapping from physical source file characters to thebasic source character set, but are encouraged not to do so.
No feature test macro is provided for this feature. Code that wishes to beportable between implementations that provide trigraphs and those that do notshould avoid using basic source character sequences containing trigraphs.