Movatterモバイル変換

Jump to content

Digraphs and trigraphs (programming)

From Wikipedia, the free encyclopedia

(Redirected fromC trigraph)

Two or three characters, treated as one

For other uses, seeDigraph andTrigraph.

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Digraphs and trigraphs" programming – news ·newspapers ·books ·scholar ·JSTOR(September 2008) (Learn how and when to remove this message)

Incomputer programming,digraphs and trigraphs are sequences of two and threecharacters, respectively, that appear insource code and, according to aprogramming language's specification, should be treated as if they were single characters.

Various reasons exist for using digraphs and trigraphs: keyboards may not have keys to cover the entirecharacter set of the language, input of special characters may be difficult,text editors may reserve some characters for special use and so on. Trigraphs might also be used for someEBCDIC code pages that lack characters such as{ and}.

History

The basic character set of theC programming language is a subset of theASCII character set that includes nine characters which lie outside theISO 646 invariant character set. This can pose a problem for writingsource code when theencoding (and possiblykeyboard) being used does not support one or more of these nine characters. TheANSI C committee invented trigraphs as a way of entering source code using keyboards that support any national version of the ISO 646 character set.^[1]

With the widespread adoption ofASCII andUnicode/UTF-8, trigraph use is limited today, and trigraph support has been removed from C as of C23.^[2]

Implementations

Trigraphs are not commonly encountered outsidecompiler test suites.^[3] Some compilers support an option to turn recognition of trigraphs off, or disable trigraphs by default and require an option to turn them on. Some can issue warnings when they encounter trigraphs in source files.Borland supplied a separate program, the trigraph preprocessor (TRIGRAPH.EXE), to be used only when trigraph processing is desired (the rationale was to maximise speed of compilation).

Language support

Different systems define different sets of digraphs and trigraphs, as described below.

ALGOL

Early versions ofALGOL predated the standardized ASCII and EBCDIC character sets, and were typically implemented using a manufacturer-specificsix-bit character code. A number of ALGOL operations either lackedcodepoints in the available character set or were not supported by peripherals, leading to a number of substitutions including:= for← (assignment) and>= for≥ (greater than or equal).

Pascal

ThePascal programming language supports digraphs(.,.),(* and*) for[,],{ and} respectively. Unlike all other cases mentioned here,(* and*) were and still are in wide use. However, many compilers treat them as a different type of commenting block rather than as actual digraphs, that is, a comment started with(* cannot be closed with} and vice versa.

J

TheJ programming language is a descendant ofAPL but uses the ASCII character set rather thanAPL symbols. Because the printable range of ASCII is smaller than APL's specialized set of symbols,. (dot) and: (colon) characters are used to inflect ASCII symbols, effectively interpreting unigraphs, digraphs or rarely trigraphs as standalone "symbols".^[4]

Unlike the use of digraphs and trigraphs in C andC++, there are no single-character equivalents to these in J.

C

See also:C alternative tokens

Trigraph	Equivalent
`??=`	`#`
`??/`	`\`
`??'`	`^`
`??(`	`[`
`??)`	`]`
`??!`	`\|`
`??<`	`{`
`??>`	`}`
`??-`	`~`

TheC preprocessor (used for C and with slight differences inC++; seebelow) replaces all occurrences of the nine trigraph sequences in this table by their single-character equivalents before any other processing (untilC23^[5]).^[6]^[7]

A programmer may want to place two question marks together yet not have the compiler treat them as introducing a trigraph. The C grammar does not permit two consecutive? tokens, so the only places in a C file where two question marks in a row may be used are in multi-character constants,string literals, and comments. This is particularly a problem for theclassic Mac OS, where the constant'????' may be used as a filetype orcreator.^[8] To safely place two consecutive question marks within a string literal, the programmer can use string concatenation"...?""?..." or anescape sequence"...?\?...".

??? is not itself a trigraph sequence, but when followed by a character such as- it will be interpreted as? +??-, which becomes?~.

The??/ trigraph can be used to introduce an escaped newline for line splicing; this must be taken into account for correct and efficient handling of trigraphs within the preprocessor. It can also cause surprises, particularly within comments. For example:

// This is all just one comment!???/a++;

which is a single logical comment line (used in C++ andC99), and

/??/*Acomment*??//

which is a correctly formed block comment. The concept can be used to check for trigraphs as in the following C99 example, where only one return statement will be executed.

// returns false or true; language standard C99 or laterbooltrigraphsAvailable(){// are trigraphs available??/returnfalse;returntrue;}

Alternative digraphs introduced in the C standard in 1994
Digraph	Equivalent
`<:`	`[`
`:>`	`]`
`<%`	`{`
`%>`	`}`
`%:`	`#`

In 1994, a normative amendment to the C standard,C95,^[9]^[10] included in C99, supplied digraphs as more readable alternatives to five of the trigraphs.

Unlike trigraphs, digraphs are handled duringtokenization, and any digraph must always represent a full token by itself, or compose the token%:%: replacing the preprocessor concatenation token##. If a digraph sequence occurs inside another token, for example a quoted string, or a character constant, it will not be replaced.

C++

See also:C alternative tokens

C++ (throughC++14, seebelow) behaves like C, including the C99 additions.^[11]

As a note,%:%: is treated as a single token, rather than two occurrences of%:.

In the sequence<:: if the subsequent character is neither: nor>, the< is treated as a preprocessing token by itself and not as the first character of the alternative token<:. This is done so certain uses of templates are not broken by the substitution.

The C++ Standard makes this comment with regards to the term "digraph":^[12]

The term "digraph" (token consisting of two characters) is not perfectly descriptive, since one of the alternative preprocessing-tokens is%:%: and of course several primary tokens contain two characters. Nonetheless, those alternative tokens that aren't lexical keywords are colloquially known as "digraphs".

Trigraphs were proposed for deprecation inC++0x, which was released asC++11.^[13] This was opposed byIBM, speaking on behalf of itself and other users of C++,^[14] and as a result trigraphs were retained in C++11. Trigraphs were then proposed again for removal (not only deprecation) inC++17.^[15] This passed a committee vote, and trigraphs (but not the additional tokens) are removed from C++17 despite the opposition from IBM.^[16] Existing code that uses trigraphs can be supported by translating from the source files (parsing trigraphs) to the basic source character set that does not include trigraphs.^[15]

RPL

Hewlett-Packard calculators supporting theRPL language and input method provide support for a large number of trigraphs (also calledTIO codes) to reliably transcribe non-seven-bit ASCII characters of thecalculators' extended character set^[17]^[18]^[19] on foreign platforms, and to ease keyboard input without using theCHARS application.^[20]^[21]^[18]^[19] The first character of all TIO codes is a\, followed by two other ASCII characters vaguely resembling the glyph to be substituted.^[20]^[21]^[18]^[19]^[22] All other characters can be entered using the special\nnn TIO code syntax with nnn being a three-digitdecimal number (withleading zeros if necessary) of the correspondingcode point (thereby formally representing atetragraph).^[20]^[18]^[19]

Application support

Vim

TheVim text editor supports digraphs for actual entry of text characters, followingRFC 1345. The entry of digraphs isbound toCtrl+K by default.^[23] The list of all possible digraphs inVim can be displayed by typing:dig.

GNU Screen

GNU Screen has a digraph command, bound toCtrl+ACtrl+V by default.^[24]

Lotus

Lotus 1-2-3 forMS-DOS usesAlt+F1 ascompose key to allow easier input of many special characters of theLotus International Character Set (LICS)^[25] andLotus Multi-Byte Character Set (LMBCS).

See also

AltGr key – Modifier key on some computer keyboards
C alternative tokens – C standard library header providing a set of alternative spellings of common operators
Compose key – Computer key to initiate glyph merger
Dead key – Special kind of modifier keyboard key
Escape sequence – Series of characters with a special meaning
Escape sequences in C – Special character sequences in the C programming language
List of XML and HTML character entity references

References

^Rationale for International Standard—Programming Languages—C(PDF). Revision 5.10. pp. 20–21.
^"Removing trigraphs??!"(PDF).
^Jones, Derek M. "Sentence 117".The New C Standard: An Economic and Cultural Commentary.
^Hui, Roger."Vocabulary".jsoftware.com. Archived fromthe original on 2019-04-02. Retrieved2015-04-16.
^"Removing trigraphs??!"(PDF).
^British Standards Institute (2003).The C Standard - Incorporating TC1 - BS ISO/IEC 9899:1999.John Wiley & Sons.ISBN 0-470-84573-2.
^"Rationale for International Standard - Programming Languages - C"(PDF). 5.10. April 2003.Archived(PDF) from the original on 2016-06-06. Retrieved2010-10-17.
^"File Basics".whitefiles.org. Retrieved2024-05-08.
^ISO/IEC 9899:1990/Amd 1:1995 - Programming languages — C — Amendment 1: C Integrity. March 1995. Retrieved2024-05-30.
^Clive D.W. Feather (2010-09-12)."A brief description of Normative Addendum 1".
^Stroustrup, Bjarne (1994-03-29).Design and Evolution of C++ (1 ed.).Addison-Wesley Publishing Company.ISBN 0-201-54330-3.
^Du Toit, Stefanus, ed. (2012-01-16)."Working Draft, Standard for Programming Language C++"(PDF). N3337.Archived(PDF) from the original on 2019-05-08. Retrieved2019-05-08.
^"C++0X, CD 1, National Body Comments"(PDF). 2009-01-30. SC22/WG21 N2837 comment UK 11.Archived(PDF) from the original on 2017-08-01. Retrieved2019-05-12.
^Wong, Michael; Tong, Hubert; Klarer, Robert; McIntosh, Ian; Mak, Raymond; Cambly, Christopher; LaBonté, Alain (2009-06-19)."Comment on Proposed Trigraph Deprecation"(PDF). N2910.Archived(PDF) from the original on 2017-08-01. Retrieved2019-05-12.
^^a ^bSmith, Richard (2014-05-06)."Removing trigraphs??!". N3981.Archived from the original on 2018-07-09. Retrieved2019-05-12.
^Wong, Michael; Tong, Hubert; Bhakta, Rajan; Inglis, Derek (2014-10-10)."IBM comment on preparing for a Trigraph-adverse future in C++17"(PDF). IBM paper N4210.Archived(PDF) from the original on 2018-09-11. Retrieved2019-05-12.
^HP 82240B Infrared Printer (1 ed.). Corvallis, OR, USA:Hewlett-Packard. August 1989. HP reorder number 82240-90014.
^^a ^b ^c ^dHP 48G Series – User's Guide (UG) (8 ed.).Hewlett-Packard. December 1994 [1993]. pp. 2–5,27–16. HP 00048-90126, (00048-90104).Archived from the original on 2016-08-06. Retrieved2015-09-06.[1]
^^a ^b ^c ^dHP 50g / 49g+ / 48gII graphing calculator advanced user's reference manual (AUR) (2 ed.).Hewlett-Packard. 2009-07-14 [2005]. pp. J-1, J-2. HP F2228-90010.Archived from the original on 2018-07-08. Retrieved2015-10-10.Searchable PDF
^^a ^b ^c"HP RPL TIO Table".holyjoe.org.Archived from the original on 2016-05-23. Retrieved2015-01-23.
^^a ^bHeinz, Sr., Michael W. (2005)."HP-ASCII and Trigraphs".Archived from the original on 2016-08-02. Retrieved2016-08-02.
^Finseth, Craig A. (2012-02-25)."chars".Archived from the original on 2017-12-21. Retrieved2017-12-21.
^"Vim documentation: *digraphs-default*". 2011-01-15.Archived from the original on 2018-12-20. Retrieved2019-05-12.
^"Digraph - Screen User's Manual".Archived from the original on 2018-12-31. Retrieved2019-05-12.
^"Appendix F".HP 95LX User's Guide(PDF) (2 ed.). Corvallis, OR, USA:Hewlett-Packard Company, Corvallis Division. June 1991 [March 1991]. F0001-90003.Archived(PDF) from the original on 2016-11-28. Retrieved2016-11-27.

External links

RFC 1345

Retrieved from "https://en.wikipedia.org/w/index.php?title=Digraphs_and_trigraphs_(programming)&oldid=1327786956#C"

Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp