Language(s) | Various. |
Standard | |
Classification | Stateful system ofencodings (with stateless pre-configured subsets) |
Transforms / Encodes | US-ASCII and, depending on implementation: |
Succeeded by | ISO/IEC 10646 (Unicode) |
Other related encoding(s) | Stateful subsets: Pre-configured versions: |
ISO/IEC 2022Information technology—Character code structure and extension techniques, is anISO/IEC standard in the field ofcharacter encoding. It is equivalent to theECMA standardECMA-35,[1][2] theANSI standardANSI X3.41[3] and theJapanese Industrial StandardJIS X 0202. Originating in 1971, it was most recently revised in 1994.[4]
ISO 2022 specifies a general structure which character encodings can conform to, dedicating particular ranges of bytes (0x00–1F and 0x7F–9F) to be used for non-printingcontrol codes[5] for formatting andin-band instructions (such asline breaks or formatting instructions fortext terminals), rather thangraphical characters. It also specifies a syntax for escape sequences, multiple-byte sequences beginning with theESC control code, which can likewise be used for in-band instructions.[6] Specific sets of control codes and escape sequences designed to be used with ISO 2022 includeISO/IEC 6429, portions of which are implemented byANSI.SYS andterminal emulators.
ISO 2022 itself also defines particular control codes and escape sequences which can be used for switching between differentcoded character sets (for example, betweenASCII and the JapaneseJIS X 0208) so as to use multiple in a single document,[7] effectively combining them into a singlestateful encoding (a feature less important since the advent ofUnicode). It is designed to be usable in both 8-bit environments and 7-bit environments (those where only seven bits are usable in a byte, such ase-mail without8BITMIME).[8]
The ASCII character set supports theISO Basic Latin alphabet (equivalent to theEnglish alphabet), and does not provide good support for languages which use additional letters, or which use a differentwriting system altogether. Other writing systems with relatively few characters, such asGreek,Cyrillic,Arabic orHebrew, as well as forms of theLatin script usingdiacritics or letters absent from the ISO Basic Latin alphabet, have historically been represented onpersonal computers with different 8-bit,single byte,extended ASCII encodings, which follow ASCII when themost significant bit is 0 (i.e. bytes 0x00–7F, when represented inhexadecimal), and include additional characters for a most significant bit of 1 (i.e. bytes 0x80–FF). Some of these, such as theISO 8859 series, conform to ISO 2022,[9][10] while others such asDOS code page 437 do not, usually due to not reserving the bytes 0x80–9F for control codes.
CertainEast Asian languages, specificallyChinese,Japanese, andKorean (collectively "CJK"), are written using far more characters than the maximum of 256 which can be represented in a single byte, and were first represented on computers with language-specificdouble-byte encodings orvariable-width encodings; some of these (such as theSimplified Chinese encodingGB 2312) conform toISO 2022, while others (such as theTraditional Chinese encodingBig5) do not. Control codes in ISO 2022 are always represented with a single byte, regardless of the number of bytes used for graphical characters. CJK encodings used in 7-bit environments which useISO 2022 mechanisms to switch between character sets are often given names starting with "ISO-2022-", most notablyISO-2022-JP, although some other CJK encodings such asEUC-JP also make use of ISO 2022 mechanisms.[11][12]
Since the first 256code points ofUnicode were taken fromISO 8859-1, Unicode inherits the concept ofC0 and C1 control codes from ISO 2022, although it addsother non-printing characters besides the ISO 2022 control codes. However,Unicode transformation formats such asUTF-8 generally deviate from the ISO 2022 structure in various ways, including:
ISO 2022 escape sequences do, however, exist for switching to and from UTF-8 as a "coding system different from that of ISO 2022",[13] which are supported by certainterminal emulators such asxterm.[14]
ISO/IEC 2022 specifies the following:
A specific implementation does not have to implement all of the standard; the conformance level and the supported character sets are defined by the implementation. Although many of the mechanisms defined by the ISO/IEC 2022 standard are infrequently used, several established encodings are based on a subset of the ISO/IEC 2022 system.[19] In particular, 7-bit encoding systems using ISO/IEC 2022 mechanisms includeISO-2022-JP (orJIS encoding), which has primarily been used in Japanese-languagee-mail. 8-bit encoding systems conforming to ISO/IEC 2022 includeISO/IEC 4873 (ECMA-43), which is in turn conformed to byISO/IEC 8859,[9][10] andExtended Unix Code, which is used forEast Asian languages.[11] More specialised applications of ISO 2022 include theMARC-8 encoding system used inMARC 21 library records.[3]
The escape sequences for switching to particular character sets or encodings are registered with theISO-IR registry (except for those set apart for private use, the meanings of which are defined by vendors, or by protocol specifications such asARIB STD-B24) and follow the patterns defined within the standard. Character encodings making use of these escape sequences require data to be processed sequentially in a forward direction, since the correct interpretation of the data depends on previously encountered escape sequences.
Specific profiles such as ISO-2022-JP may impose extra conditions, such as that the current character set is reset to US-ASCII before the end of a line. Furthermore, the escape sequences declaring the national character sets may be absent if a specific ISO-2022-based encoding permits or requires this, and dictates that particular national character sets are to be used. For example, ISO-8859-1 states that no defining escape sequence is needed.
To represent large character sets, ISO/IEC 2022 builds onISO/IEC 646's property that a seven-bit character representation will normally be able to represent 94 graphic (printable) characters (in addition to space and 33 control characters); if only the C0 control codes (narrowly defined) are excluded, this can be expanded to 96 characters. Using two bytes, it is thus possible to represent up to 8,836 (94×94) characters; and, using three bytes, up to 830,584 (94×94×94) characters. Though the standard defines it, no registered character set uses three bytes (althoughEUC-TW's unregistered G2 does, as does the similarly unregisteredCCCII).
For the two-byte character sets, thecode point of each character is normally specified in so-calledrow-cell orkuten[a] form, which comprises two numbers between 1 and 94 inclusive, specifying a row[b] and cell[c] of that character within the zone. For a three-byte set, an additionalplane[d] number is included at the beginning.[20] The escape sequences do not only declare which character set is being used, but also whether the set is single-byte or multi-byte (although not how many bytes it uses if it is multi-byte), and also whether each byte has 94 or 96 permitted values.
ISO/IEC 2022 coding specifies a two-layer mapping between character codes and displayed characters.Escape sequences allow any of a large registry of graphic character sets to be "designated"[21] into one of four working sets, named G0 through G3, and shorter control sequences specify the working set that is "invoked"[22] to interpret bytes in the stream.
Encoding byte values ("bit combinations") are often given incolumn-line notation, where two decimal numbers in the range 00–15 (each corresponding to a single hexadecimal digit) are separated by a slash.[23] Hence, for instance, codes 2/0 (0x20) through 2/15 (0x2F) inclusive may be referred to as "column 02". This is the notation used in the ISO/IEC 2022 / ECMA-35 standard itself.[24] They may be described elsewhere usinghexadecimal, as is often used in this article, or using the corresponding ASCII characters,[25] although the escape sequences are actually defined in terms of byte values, and the graphic assigned to that byte value may be altered without affecting the control sequence.
Byte values from the 7-bit ASCII graphic range (hexadecimal 0x20–0x7F), being on the left side of a character code table, are referred to as "GL" codes(with "GL" standing for "graphics left") while bytes from the "high ASCII" range (0xA0–0xFF), if available (i.e. in an 8-bit environment), are referred to as the "GR" codes("graphics right").[5] The terms "CL" (0x00–0x1F) and "CR" (0x80–0x9F) are defined for the control ranges, but the CL range always invokes the primary (C0) controls, whereas the CR range always either invokes the secondary (C1) controls or is unused.[5]
Thedelete character DEL (0x7F), theescape character ESC (0x1B) and thespace character SP (0x20) are designated "fixed" coded characters[26] and are always available when G0 is invoked over GL, irrespective of what character sets are designated. They may not be included in graphical character sets, although other sizes or types ofwhitespace character may be.[27]
Sequences using the ESC (escape) character take the formESC [I...]F
, where the ESC character is followed by zero or more intermediate bytes[28] (I) from the range 0x20–0x2F, and one final byte[29] (F) from the range 0x30–0x7E.[30]
The firstI byte, or absence thereof, determines the type of escape sequence; it might, for instance, designate a working set, or denote a single control function. In all types of escape sequences,F bytes in the range 0x30–0x3F are reserved for unregistered private uses defined by prior agreement between parties.[31]
Control functions from some sets may make use of further bytes following the escape sequence proper. For example, theISO 6429 control function "Control Sequence Introducer", which can be represented using an escape sequence, is followed by zero or more bytes in the range 0x30–0x3F, then zero or more bytes in the range 0x20–0x2F, then by a single byte in the range 0x40–0x7E, the entire sequence being called a "control sequence".[32]
Each of the four working sets G0 through G3 may be a 94-character set or a 94n-charactermulti-byte set. Additionally, G1 through G3 may be a 96- or 96n-character set.
In a 96- or 96n-character set, the bytes 0x20 through 0x7F when GL-invoked, or 0xA0 through 0xFF when GR-invoked, are allocated to and may be used by the set. In a 94- or 94n-character set, the bytes 0x20 and 0x7F are not used.[33] When a 96- or 96n-character set is invoked in the GL region, the space and delete characters (codes 0x20 and 0x7F) are not available until a 94- or 94n-character set (such as the G0 set) is invoked in GL.[5] 96-character sets cannot be designated to G0.
Registration of a set as a 96-character set does not necessarily mean that the 0x20/A0 and 0x7F/FF bytes are actually assigned by the set; some examples of graphical character sets which are registered as 96-sets but do not use those bytes include the G1 set ofI.S. 434,[34] the box drawing set fromISO/IEC 10367,[35] and ISO-IR-164 (a subset of the G1 set ofISO-8859-8 with only the letters, used byCCITT).[36]
Characters are expected to be spacing characters, not combining characters, unless specified otherwise by the graphical set in question.[37] ISO 2022 / ECMA-35 also recognizes the use of thebackspace and carriage return control characters as means of combining otherwise spacing characters, as well as theCSI sequence "Graphic Character Combination" (GCC)[37] (CSI 0x20 (SP) 0x5F (_)
Use of the backspace and carriage return in this manner is permitted byISO/IEC 646 but prohibited byISO/IEC 4873 / ECMA-43[39] and byISO/IEC 8859,[40][41] on the basis that it leaves the graphical character repertoire undefined. ISO/IEC 4873 / ECMA-43 does, however, permit the use of the GCC function provided that the sequence of characters is kept the same and merely displayed in one space, rather than being over-stamped to form a character with a different meaning.[42]
Control character sets are classified as "primary" or "secondary" control code sets,[43] respectively also called "C0" and "C1" control code sets.[44]
A C0 control set must contain the ESC (escape) control character at 0x1B[45] (a C0 set containing only ESC is registered as ISO-IR-104),[46] whereas a C1 control set may not contain the escape control whatsoever.[33] Hence, they are entirely separate registrations, with a C0 set being only a C0 set and a C1 set being only a C1 set.[44]
If codes from the C0 set of ISO 6429 / ECMA-48, i.e. theASCII control codes, appear in the C0 set, they are required to appear at their ISO 6429 / ECMA-48 locations.[45] Inclusion of transmission control characters in the C0 set, besides the ten included by ISO 6429 / ECMA-48 (namely SOH, STX, ETX, EOT, ENQ, ACK, DLE, NAK, SYN and ETB),[47] or inclusion of any of those ten in the C1 set, is also prohibited by the ISO/IEC 2022 / ECMA-35 standard.[45][33]
A C0 control set is invoked over the CL range 0x00 through 0x1F,[48] whereas a C1 control function may be invoked over the CR range 0x80 through 0x9F (in an 8-bit environment) or by using escape sequences (in a 7-bit or 8-bit environment),[43] but not both. Which style of C1 invocation is used must be specified in the definition of the code version.[49] For example, ISO/IEC 4873 specifies CR bytes for the C1 controls which it uses (SS2 and SS3).[50] If necessary, which invocation is used may be communicated usingannouncer sequences.
In the latter case, single control functions from the C1 control code set are invoked using "type Fe" escape sequences,[33] meaning those where the ESC control character is followed by a byte from columns 04 or 05 (that is to say,ESC 0x40 (@)
throughESC 0x5F (_)
Additional control functions are assigned to "type Fs" escape sequences (in the rangeESC 0x60 (`)
throughESC 0x7E (~)
); these have permanently assigned meanings rather than depending on the C0 or C1 designations.[51][52] Registration of control functions to type "Fs" sequences must be approved byISO/IEC JTC 1/SC 2.[52] Other single control functions may be registered to type "3Ft" escape sequences (in the rangeESC 0x23 (#) [I...] 0x40 (@)
throughESC 0x23 (#) [I...] 0x7E (~)
),[53] although no "3Ft" sequences are currently assigned (as of 2019).[54] Some of these are specified in ECMA-35 (ISO 2022 / ANSI X3.41), others in ECMA-48 (ISO 6429 / ANSI X3.64).[55] ECMA-48 refers to these as "independent control functions".[56]
Code | Hex | Abbr. | Name | Effect[54] |
ESC ` | 1B 60 | DMI | Disable manual input | Disables some or all of the manual input facilities of the device. |
ESC a | 1B 61 | INT | Interrupt | Interrupts the current process. |
ESC b | 1B 62 | EMI | Enable manual input | Enables the manual input facilities of the device. |
ESC c | 1B 63 | RIS | Reset to initial state | Resets the device to its state after being powered on.[57] |
ESC d | 1B 64 | CMD | Coding method delimiter | Used when interacting with an outer coding / representation system,see below. |
ESC n | 1B 6E | LS2 | Locking shift two | Shift function,see below. |
ESC o | 1B 6F | LS3 | Locking shift three | Shift function,see below. |
ESC | | 1B 7C | LS3R | Locking shift three right | Shift function,see below. |
ESC } | 1B 7D | LS2R | Locking shift two right | Shift function,see below. |
ESC ~ | 1B 7E | LS1R | Locking shift one right | Shift function,see below. |
Escape sequences of type "Fp" (ESC 0x30 (0)
throughESC 0x3F (?)
) or of type "3Fp" (ESC 0x23 (#) [I...] 0x30 (0)
throughESC 0x23 (#) [I...] 0x3F (?)
) are reserved for single private use control codes, by prior agreement between parties.[58] Several such sequences of both types are used byDEC terminals such as theVT100, and are thus supported byterminal emulators.[14]
By default, GL codes specify G0 characters and GR codes (where available) specify G1 characters; this may be otherwise specified by prior agreement. The set invoked over each area may also be modified with control codes referred to as shifts, as shown in the table below.[59]
An 8-bit code may have GR codes specifying G1 characters, i.e. with its corresponding 7-bit code usingShift In andShift Out to switch between the sets (e.g.JIS X 0201),[60] although some instead have GR codes specifying G2 characters, with the corresponding 7-bit code using a single-shift code to access the second set (e.g.T.51).[61]
The codes shown in the table below are the most common encodings of these control codes, conforming toISO/IEC 6429. The LS2, LS3, LS1R, LS2R and LS3R shifts are registered as single control functions and are always encoded as the escape sequences listed below,[54] whereas the others are part of a C0 or C1 control code set (as shown below, SI (LS0) and SO (LS1) are C0 controls and SS2 and SS3 are C1 controls), meaning that their coding and availability may vary depending on which control sets are designated: they must be present in the designated control sets if their functionality is used.[48][49] The C1 controls themselves, as mentioned above, may be represented using escape sequences or 8-bit bytes, but not both.
Alternative encodings of the single-shifts as C0 control codes are available in certain control code sets. For example, SS2 and SS3 are usually available at 0x19 and 0x1D respectively inT.51[61] andT.61.[62] This coding is currently recommended by ISO/IEC 2022 / ECMA-35 for applications requiring 7-bit single-byte representations of SS2 and SS3,[63] and may also be used for SS2 only,[64] although older code sets with SS2 at 0x1C also exist,[65][66][67] and were mentioned as such in an earlier edition of the standard.[68] The 0x8E and 0x8F coding of the single shifts as shown below is mandatory forISO/IEC 4873 levels 2 and 3.[69]
Code | Hex | Abbr. | Name | Effect |
SI | 0F | SI LS0 | Shift In Locking shift zero | GL encodes G0 from now on[70][71] |
SO | 0E | SO LS1 | Shift Out Locking shift one | GL encodes G1 from now on[70][71] |
ESC n | 1B 6E | LS2 | Locking shift two | GL encodes G2 from now on[70][71] |
ESC o | 1B 6F | LS3 | Locking shift three | GL encodes G3 from now on[70][71] |
CR area:SS2 Escape code: ESC N | CR area:8E Escape code: 1B 4E | SS2 | Single shift two | GL or GR (see below) encodes G2 for the immediately following character only[72] |
CR area:SS3 Escape code: ESC O | CR area:8F Escape code: 1B 4F | SS3 | Single shift three | GL or GR (see below) encodes G3 for the immediately following character only[72] |
ESC ~ | 1B 7E | LS1R | Locking shift one right | GR encodes G1 from now on[73] |
ESC } | 1B 7D | LS2R | Locking shift two right | GR encodes G2 from now on[73] |
ESC | | 1B 7C | LS3R | Locking shift three right | GR encodes G3 from now on[73] |
Although officially considered shift codes and named accordingly, single-shift codes are not always viewed as shifts,[12] and they may simply be viewed as prefix bytes (i.e. the first bytes in a multi-byte sequence),[11] since they do not require the encoder to keep the currently active set asstate, unlike locking shift codes. In 8-bit environments, either GL or GR, but not both, may be used as the single-shift area. This must be specified in the definition of the code version.[72] For instance,ISO/IEC 4873 specifies GL, whereaspacked EUC specifies GR. In 7-bit environments, only GL is used as the single-shift area.[74][75] If necessary, which single-shift area is used may be communicated usingannouncer sequences.
The names "locking shift zero" (LS0) and "locking shift one" (LS1) refer to the same pair of C0 control characters (0x0F and 0x0E) as the names "shift in" (SI) and "shift out" (SO). However, the standard refers to them as LS0 and LS1 when they are used in 8-bit environments and as SI and SO when they are used in 7-bit environments.[59]
The ISO/IEC 2022 / ECMA-35 standard permits, but discourages, invoking G1, G2 or G3 in both GL and GR simultaneously.[76]
TheISO International register of coded character sets to be used with escape sequences (ISO-IR) lists graphical character sets, control code sets, single control codes and so forth which have been registered for use with ISO/IEC 2022. The procedure for registering codes and sets with the ISO-IR registry is specified byISO/IEC 2375. Each registration receives a unique escape sequence, and a unique registry entry number to identify it.[77][78] For example, theCCITT character set forSimplified Chinese is known asISO-IR-165.
Registration of coded character sets with the ISO-IR registry identifies the documents specifying the character set or control function associated with an ISO/IEC 2022 non‑private-use escape sequence. This may be a standard document; however, registration does not create a new ISO standard, does not commit the ISO or IEC to adopt it as an international standard, and does not commit the ISO or IEC to add any of its characters to theUniversal Coded Character Set.[79]
ISO-IR registered escape sequences are also used encapsulated in aFormal Public Identifier to identify character sets used for numeric character references inSGML (ISO 8879). For example, the stringISO 646-1983//CHARSET International Reference Version (IRV)//ESC 2/5 4/0
can be used to identify the International Reference Version ofISO 646-1983,[80] and theHTML 4.01 specification usesISO Registration Number 177//CHARSET ISO/IEC 10646-1:1993 UCS-4 with implementation level 3//ESC 2/5 2/15 4/6
to identify Unicode.[81] The textual representation of the escape sequence, included in the third element of the FPI, will be recognised by SGML implementations for supported character sets.[80]
Escape sequences to designate character sets take the formESCI [I...]F
. As mentioned above, the intermediate (I) bytes are from the range 0x20–0x2F, and the final (F) byte is from the range 0x30–0x7E. The firstI byte (or, for a multi-byte set, the first two) identifies the type of character set and the working set it is to be designated to, whereas theF byte (and any additionalI bytes) identify the character set itself, as assigned in the ISO-IR register (or, for the private-use escape sequences, by prior agreement).
AdditionalI bytes may be added before theF byte to extend theF byte range. This is currently only used with 94-character sets, where codes of the formESC ( !F
have been assigned.[82] At the other extreme, no multibyte 96-sets have been registered, so the sequences below are strictly theoretical.
As with other escape sequence types, the range 0x30–0x3F is reserved for private-useF bytes,[31] in this case for private-use character set definitions (which might include unregistered sets defined by protocols such asARIB STD-B24[83] orMARC-8,[3] or vendor-specific sets such asDEC Special Graphics).[84] However, in a graphical set designation sequence, if the secondI byte (for a single-byte set) or the thirdI byte (for a double-byte set) is 0x20 (space), the set denoted is a "dynamically redefinable character set" (DRCS) defined by prior agreement,[85] which is also considered private use.[31] A graphical set being considered a DRCS implies that it represents a font of exact glyphs, rather than a set of abstract characters.[86] The manner in which DRCS sets and associated fonts are transmitted, allocated and managed is not stipulated by ISO/IEC 2022 / ECMA-35 itself, although it recommends allocating them sequentially starting withF byte 0x40 (@
);[87] however, a manner for transmitting DRCS fonts is defined within some telecommunication protocols such asWorld System Teletext.[88]
There are also three special cases for multi-byte codes. The code sequencesESC $ @
,ESC $ A
, andESC $ B
were all registered when the contemporary version of the standard allowed multi-byte sets only in G0, so must be accepted in place of the sequencesESC $ ( @
throughESC $ ( B
to designate to the G0 character set.[89]
There are additional (rarely used) features for switching control character sets, but this is a single-level lookup, in that (as noted above) the C0 set is always invoked over CL, and the C1 set is always invoked over CR or by using escape codes. As noted above, it is required that any C0 character set include the ESC character at position 0x1B, so that further changes are possible. The control set designation sequences (as opposed to the graphical set ones) may also be used from withinISO/IEC 10646 (UCS/Unicode), in contexts where processingANSI escape codes is appropriate, provided that each byte in the sequence is padded to the code unit size of the encoding.[90]
A table of escape sequenceI bytes and the designation or other function which they perform is below.[91]
Code | Hex | Abbr. | Name | Effect | Example |
ESC SPF | 1B 20F | ACS | Announce code structure | Specifies code features used, e.g. working sets (seebelow).[92] | ESC SP L (ISO 4873 level 1) |
ESC !F | 1B 21F | CZD | C0-designate | F selects a C0 control character set to be used.[93] | ESC ! @ (ASCII C0 codes) |
ESC "F | 1B 22F | C1D | C1-designate | F selects a C1 control character set to be used.[94] | ESC " C (ISO 6429 C1 codes) |
ESC #F | 1B 23F | - | (Single control function) | (Reserved for sequences for control functions,see above.) | ESC # 6 (private use: DECDouble Width Line)[95] |
| GZDM4 | G0-designate multibyte 94-set | F selects a 94n-character set to be used for G0.[89] | ESC $ ( C (KS X 1001 in G0) |
ESC $ )F | 1B 24 29F | G1DM4 | G1-designate multibyte 94-set | F selects a 94n-character set to be used for G1.[89] | ESC $ ) A (GB 2312 in G1) |
ESC $ *F | 1B 24 2AF | G2DM4 | G2-designate multibyte 94-set | F selects a 94n-character set to be used for G2.[89] | ESC $ * B (JIS X 0208 in G2) |
ESC $ +F | 1B 24 2BF | G3DM4 | G3-designate multibyte 94-set | F selects a 94n-character set to be used for G3.[89] | ESC $ + D (JIS X 0212 in G3) |
ESC $ ,F | 1B 24 2CF | - | (not used) | (not used)[f] | - |
ESC $ -F | 1B 24 2DF | G1DM6 | G1-designate multibyte 96-set | F selects a 96n-character set to be used for G1.[89] | ESC $ - 1 (private use) |
ESC $ .F | 1B 24 2EF | G2DM6 | G2-designate multibyte 96-set | F selects a 96n-character set to be used for G2.[89] | ESC $ . 2 (private use) |
ESC $ /F | 1B 24 2FF | G3DM6 | G3-designate multibyte 96-set | F selects a 96n-character set to be used for G3.[89] | ESC $ / 3 (private use) |
ESC %F | 1B 25F | DOCS | Designate other coding system | Switches coding system,see below. | ESC % G (UTF-8) |
ESC &F | 1B 26F | IRR | Identify revised registration | Prefixes designation escape to denote revision.[g] | ESC & @ ESC $ B (JIS X 0208:1990 in G0) |
ESC 'F | 1B 27F | - | (not used) | (not used) | - |
ESC (F | 1B 28F | GZD4 | G0-designate 94-set | F selects a 94-character set to be used for G0.[89] | ESC ( B (ASCII in G0) |
ESC )F | 1B 29F | G1D4 | G1-designate 94-set | F selects a 94-character set to be used for G1.[89] | ESC ) I (JIS X 0201 Kana in G1) |
ESC *F | 1B 2AF | G2D4 | G2-designate 94-set | F selects a 94-character set to be used for G2.[89] | ESC * v (ITU T.61 RHS in G2) |
ESC +F | 1B 2BF | G3D4 | G3-designate 94-set | F selects a 94-character set to be used for G3.[89] | ESC + D (NATS-SEFI-ADD in G3) |
ESC ,F | 1B 2CF | - | (not used) | (not used)[h] | - |
ESC -F | 1B 2DF | G1D6 | G1-designate 96-set | F selects a 96-character set to be used for G1.[89] | ESC - A (ISO 8859-1 RHS in G1) |
ESC .F | 1B 2EF | G2D6 | G2-designate 96-set | F selects a 96-character set to be used for G2.[89] | ESC . B (ISO 8859-2 RHS in G2) |
ESC /F | 1B 2FF | G3D6 | G3-designate 96-set | F selects a 96-character set to be used for G3.[89] | ESC / b (ISO 8859-15 RHS in G3) |
Note that the registry ofF bytes is independent for the different types. The 94-character graphic set designated byESC ( A
throughESC + A
is not related in any way to the 96-character set designated byESC - A
throughESC / A
. And neither of those is related to the 94n-character set designated byESC $ ( A
throughESC $ + A
, and so on; the final bytes must be interpreted in context. (Indeed, without any intermediate bytes,ESC A
is a way of specifying the C1 control code 0x81.)
Also note that C0 and C1 control character sets are independent; the C0 control character set designated byESC ! A
(which happens to be the NATS control set for newspaper text transmission) is not the same as the C1 control character set designated byESC " A
(theCCITT attribute control set forVideotex).
The standard also defines a way to specify coding systems that do not follow its own structure.
A sequence is also defined for returning to ISO/IEC 2022; the registrations which support this sequence as encoded in ISO/IEC 2022 comprise (as of 2019) variousVideotex formats,UTF-8, andUTF-1.[99] A secondI byte of 0x2F (/
) is included in the designation sequences of codes which do not use that byte sequence to return to ISO 2022; they may have their own means to return to ISO 2022 (such as a different or padded sequence) or none at all.[100] All existing registrations of the latter type (as of 2019) are either transparent raw data,Unicode/UCS formats, or subsets thereof.[101]
Code | Hex | Abbr. | Name | Effect |
ESC % @ | 1B 25 40 | DOCS | Designate other coding system ("standard return") | Return to ISO/IEC 2022 from another encoding.[100] |
ESC %F | 1B 25F | Designate other coding system ("with standard return")[99] | F selects an 8-bit code; useESC % @ to return.[100] | |
ESC % /F | 1B 25 2FF | Designate other coding system ("without standard return")[101] | F selects an 8-bit code; there is no standard way to return.[100] | |
ESC d | 1B 64 | CMD | Coding method delimiter | Denotes the end of an ISO/IEC 2022 coded sequence.[102] |
Of particular interest are the sequences which switch toISO/IEC 10646 (Unicode) formats which do not follow the ISO/IEC 2022 structure. These include UTF-8 (which does not reserve the range 0x80–0x9F for control characters), its predecessor UTF-1 (which mixes GR and GL bytes in multi-byte codes), and UTF-16 and UTF-32 (which use wider coding units).[99][101]
Several codes were also registered for subsets (levels 1 and 2) of UTF-8, UTF-16 and UTF-32, as well as for three levels ofUCS-2.[101] However, the only codes currently specified by ISO/IEC 10646 are the level-3 codes for UTF-8, UTF-16 and UTF-32 and the unspecified-level code for UTF-8, with the rest being listed as deprecated.[103] ISO/IEC 10646 stipulates that thebig-endian formats of UTF-16 and UTF-32 are designated by their escape sequences.[104]
Unicode Format | Code(s) | Hex[103] | Deprecated codes | Deprecated hex[99][101][103] |
UTF-1 | (UTF-1 not in current ISO/IEC 10646.) | ESC % B | 1B 25 42 | |
UTF-8 | ESC % G ,ESC % / I | 1B 25 47 ,[13]1B 25 2F 49 [105] | ESC % / G ,ESC % / H | 1B 25 2F 47 ,1B 25 2F 48 |
UTF-16 | ESC % / L | 1B 25 2F 4C [106] | ESC % / @ ,ESC % / C ,ESC % / E ,ESC % / J ,ESC % / K | 1B 25 2F 40 ,1B 25 2F 43 ,1B 25 2F 45 ,1B 25 2F 4A ,1B 25 2F 4B |
UTF-32 | ESC % / F | 1B 25 2F 46 | ESC % / A ,ESC % / D | 1B 25 2F 41 ,1B 25 2F 44 |
Of the sequences switching to UTF-8,ESC % G
is the one supported by, for example,xterm.[14]
Although use of a variant of the standard return sequence from UTF-16 and UTF-32 is permitted, the bytes of the escape sequence must be padded to the size of the code unit of the encoding (i.e.001B 0025 0040
for UTF-16), i.e. the coding of the standard return sequence does not conform exactly to ISO/IEC 2022. For this reason, the designations for UTF-16 and UTF-32 use a without-standard-return syntax.[107]
For specifying encodings by labels, theX Consortium'sCompound Text format defines five private-use DOCS sequences.[108]
The sequence "announce code structure" (ESC SP (0x20)F
) is used toannounce a specific code structure, or a specific group of ISO 2022 facilities which are used in a particular code version. Although announcements can be combined, certain contradictory combinations (specifically, using locking shift announcements 16–23 with announcements 1, 3 and 4) are prohibited by the standard, as is using additional announcements on top ofISO/IEC 4873 level announcements 12–14[92] (which fully specify the permissible structural features). Announcement sequences are as follows:
Number | Code | Hex | Code version feature announced[92] |
1 | ESC SP A | 1B 20 41 | G0 in GL, GR absent or unused, no locking shifts. |
2 | ESC SP B | 1B 20 42 | G0 and G1 invoked to GL by locking shifts, GR absent or unused. |
3 | ESC SP C | 1B 20 43 | G0 in GL, G1 in GR, no locking shifts, requires an 8-bit environment. |
4 | ESC SP D | 1B 20 44 | G0 in GL, G1 in GR if 8-bit, no locking shifts unless in a 7-bit environment. |
5 | ESC SP E | 1B 20 45 | Shift functions preserved during 7-bit/8-bit conversion. |
6 | ESC SP F | 1B 20 46 | C1 controls using escape sequences. |
7 | ESC SP G | 1B 20 47 | C1 controls in CR region in 8-bit environments, as escape sequences otherwise. |
8 | ESC SP H | 1B 20 48 | 94-character graphical sets only. |
9 | ESC SP I | 1B 20 49 | 94-character and/or 96-character graphical sets. |
10 | ESC SP J | 1B 20 4A | Uses a 7-bit code, even if an eighth bit is available for use. |
11 | ESC SP K | 1B 20 4B | Requires an 8-bit code. |
12 | ESC SP L | 1B 20 4C | Complies toISO/IEC 4873 (ECMA-43) level 1. |
13 | ESC SP M | 1B 20 4D | Complies toISO/IEC 4873 (ECMA-43) level 2. |
14 | ESC SP N | 1B 20 4E | Complies toISO/IEC 4873 (ECMA-43) level 3. |
16 | ESC SP P | 1B 20 50 | SI / LS0 used. |
18 | ESC SP R | 1B 20 52 | SO / LS1 used. |
19 | ESC SP S | 1B 20 53 | LS1R used in 8-bit environments, SO used in 7-bit environments. |
20 | ESC SP T | 1B 20 54 | LS2 used. |
21 | ESC SP U | 1B 20 55 | LS2R used in 8-bit environments, LS2 used in 7-bit environments. |
22 | ESC SP V | 1B 20 56 | LS3 used. |
23 | ESC SP W | 1B 20 57 | LS3R used in 8-bit environments, LS3 used in 7-bit environments. |
26 | ESC SP Z | 1B 20 5A | SS2 used. |
27 | ESC SP [ | 1B 20 5B | SS3 used. |
28 | ESC SP \ | 1B 20 5C | Single-shifts invoke over GR. |
Six 7-bit ISO 2022 code versions (ISO-2022-CN, ISO-2022-CN-EXT, ISO-2022-JP, ISO-2022-JP-1, ISO-2022-JP-2 and ISO-2022-KR) are defined byIETF RFCs, of which ISO-2022-JP and ISO-2022-KR have been extensively used in the past.[109] A number of other variants are defined by vendors, includingIBM.[110] Although UTF-8 is the preferred encoding inHTML5, legacy content in ISO-2022-JP remains sufficiently widespread that theWHATWG encoding standard retains support for it,[111] in contrast to mapping ISO-2022-KR, ISO-2022-CN and ISO-2022-CN-EXT[112] entirely to thereplacement character,[113] due to concerns aboutcode injection attacks such ascross-site scripting.[111][113]
8-bit code versions includeExtended Unix Code.[11][12] TheISO/IEC 8859 encodings also follow ISO 2022, in a subset stipulated in ISO/IEC 4873.[9][10]
ISO-2022-JP is a widely used encoding for Japanese, in particular ine-mail. It was introduced for use on the JUNET network and later codified inIETF RFC 1468, dated 1993.[114] It has an advantage over otherencodings for Japanese in that it does not require8-bit clean transmission. Microsoft calls itCode page 50220.[115] It starts in ASCII and includes the following escape sequences:
to switch to ASCII (1 byte per character)ESC ( J
to switch toJIS X 0201-1976 (ISO/IEC 646:JP) Roman set (1 byte per character)ESC $ @
to switch toJIS X 0208-1978 (2 bytes per character)ESC $ B
to switch toJIS X 0208-1983 (2 bytes per character)Use of the two characters added in JIS X 0208-1990 is permitted, but without including the IRR sequence, i.e. using the same escape sequence as JIS X 0208-1983.[114] Also, due to being registered before designating multi-byte sets except to G0 was possible, the escapes for JIS X 0208 do not include the secondI-byte(
The RFC notes that some existing systems did not distinguishESC ( B
fromESC ( J
, or did not distinguishESC $ @
fromESC $ B
, but stipulates that the escape sequences should not be changed by systems simply relaying messages such as e-mails.[114] TheWHATWG Encoding Standard referenced byHTML5 handlesESC ( B
andESC ( J
distinctly, but treatsESC $ @
the same asESC $ B
when decoding, and uses onlyESC $ B
for JIS X 0208 when encoding.[116] The RFC also notes that some past systems had made erroneous use of the sequenceESC ( H
to switch away from JIS X 0208, which is actually registered forISO-IR-11 (a Swedish variant ofISO 646 andWorld System Teletext).[114][i]
Use ofESC ( I
to switch to theJIS X 0201-1976 Kana set (1 byte per character) is not part of the ISO-2022-JP profile,[114] but is also sometimes used.Python allows it in a variant which it labelsISO-2022-JP-EXT (which also incorporates JIS X 0212 as described below, completing coverage ofEUC-JP);[117][118] this is close in both name and structure to an encoding denotedISO-2022-JPext byDEC, which furthermore adds a two-byteuser-defined region accessed withESC $ ( 0
to complete the coverage ofSuper DEC Kanji.[119] The WHATWG/HTML5 variant permits decoding JIS X 0201 katakana in ISO-2022-JP input, but converts the characters to their JIS X 0208 equivalents upon encoding.[116] Microsoft's code page for ISO-2022-JP with JIS X 0201 kana additionally permitted isCode page 50221.[115]
Other, older variants known asJIS7 andJIS8 build directly on the 7-bit and 8-bit encodings defined byJIS X 0201 and allow use of JIS X 0201 kana from G1 without escape sequences, usingShift Out and Shift In or setting the eighth bit (GR-invoked), respectively.[120] They are not widely used;[120] JIS X 0208 support in extended 8-bit JIS X 0201 is more commonly achieved viaShift JIS. Microsoft's code page for JIS X 0201-based ISO 2022 with single-byte katakana via Shift Out and Shift In isCode page 50222.[115]
ISO-2022-JP-2 is a multilingual extension of ISO-2022-JP, defined in RFC 1554 (dated 1993), which permits the following escape sequences in addition to the ISO-2022-JP ones. TheISO/IEC 8859 parts are 96-character sets which cannot be designated to G0, and are accessed from G2 using the 7-bit escape sequence form of the single-shift code SS2:[121]
to switch toGB 2312-1980 (2 bytes per character)ESC $ ( C
to switch toKS X 1001-1992 (2 bytes per character)ESC $ ( D
to switch toJIS X 0212-1990 (2 bytes per character)ESC . A
to switch toISO/IEC 8859-1 high part, Extended Latin 1 set (1 byte per character)[designated to G2]ESC . F
to switch toISO/IEC 8859-7 high part, Basic Greek set (1 byte per character)[designated to G2]ISO-2022-JP with the ISO-2022-JP-2 representation of JIS X 0212, but not the other extensions, was subsequently dubbedISO-2022-JP-1 by RFC 2237, dated 1997.[122]
IBM implements nine 7-bit ISO 2022 based encodings for Japanese, each using a different set of escape sequences: IBM-956, IBM-957, IBM-958, IBM-959, IBM-5052, IBM-5053, IBM-5054, IBM-5055 and ISO-2022-JP, which are collectively termed "TCP/IP Japanese coded character sets".[123] CCSID 9148 is the standard (RFC 1468) ISO-2022-JP.[124]
Code page / CCSID | ACRI definition number | Escape sequences for ACRI[110] |
956[125] | TCP-01 |
957[126] | TCP-02 |
958[127] | TCP-03 |
959[128] | TCP-04 |
5052[129] | TCP-05 |
5053[130] | TCP-06 |
5054[131] | TCP-07 |
5055[132] | TCP-08 |
9148[124] | TCP-16 |
TheJIS X 0213 standard, first published in 2000, defines an updated version of ISO-2022-JP, without the ISO-2022-JP-2 extensions, namedISO-2022-JP-3. The additions made by JIS X 0213 compared to the base JIS X 0208 standard resulted in a new registration being made for the extended JIS plane 1, while the new plane 2 received its own registration. The further additions to plane 1 in the 2004 edition of the standard resulted in an additional registration being added to a further revision of the profile, dubbedISO-2022-JP-2004. In addition to the basic ISO-2022-JP designation codes, the following designations are recognized:
to switch toJIS X 0201-1976 Kana set (1 byte per character)ESC $ ( O
to switch toJIS X 0213-2000 Plane 1 (2 bytes per character)ESC $ ( P
to switch toJIS X 0213-2000 Plane 2 (2 bytes per character)ESC $ ( Q
to switch toJIS X 0213-2004 Plane 1 (2 bytes per character, ISO-2022-JP-2004 only)ISO-2022-KR is defined in RFC 1557, dated 1993.[133] It encodes ASCII and the Korean double-byteKS X 1001-1992,[134][135] previously named KS C 5601-1987. Unlike ISO-2022-JP-2, it makes use of theShift Out and Shift In characters to switch between them, after includingESC $ ) C
once at the start of a line to designate KS X 1001 to G1.[133]
ISO-2022-CN andISO-2022-CN-EXT are defined in RFC 1922, dated 1996. They are 7-bit encodings making use both of the Shift Out and Shift In functions (to shift between G0 and G1), and of the 7-bit escape code forms of the single-shift functions SS2 and SS3 (to access G2 and G3).[136] They support the character setsGB 2312 (forsimplified Chinese) andCNS 11643 (fortraditional Chinese).
The basic ISO-2022-CN profile uses ASCII as its G0 (shift in) set, and also includes GB 2312 and the first two planes of CNS 11643 (due to these two planes being sufficient to represent all traditional Chinese characters from commonBig5, to which the RFC provides a correspondence in an appendix):[136]
ESC $ ) A
to switch toGB 2312-1980 (2 bytes per character)[designated to G1]ESC $ ) G
to switch toCNS 11643-1992 Plane 1 (2 bytes per character)[designated to G1]ESC $ * H
to switch to CNS 11643-1992 Plane 2 (2 bytes per character)[designated to G2]The ISO-2022-CN-EXT profile permits the following additional sets and planes.[136]
ESC $ ) E
to switch toISO-IR-165 (2 bytes per character)[designated to G1]ESC $ + I
to switch to CNS 11643-1992 Plane 3 (2 bytes per character)[designated to G3]ESC $ + J
to switch to CNS 11643-1992 Plane 4 (2 bytes per character)[designated to G3]ESC $ + K
to switch to CNS 11643-1992 Plane 5 (2 bytes per character)[designated to G3]ESC $ + L
to switch to CNS 11643-1992 Plane 6 (2 bytes per character)[designated to G3]ESC $ + M
to switch to CNS 11643-1992 Plane 7 (2 bytes per character)[designated to G3]The ISO-2022-CN-EXT profile further lists additionalGuobiao standard graphical sets as being permitted, but conditional on their being assigned registered ISO 2022 escape sequences:[136]
The character after theESC
(for single-byte character sets) orESC $
(for multi-byte character sets) specifies the type of character set and working set that is designated to. In the above examples, the character(
(0x28) designates a 94-character set to the G0 character set, whereas)
(0x29–0x2B) designates to the G1–G3 character sets.
ISO-2022-KR and ISO-2022-CN are used less frequently than ISO-2022-JP, and are sometimes deliberately not supported due to security concerns. Notably, theWHATWG Encoding Standard used byHTML5 maps ISO-2022-KR, ISO-2022-CN and ISO-2022-CN-EXT (as well asHZ-GB-2312) to the "replacement" decoder,[112] which maps all input to thereplacement character (�), in order to prevent certaincross-site scripting and related attacks, which utilize a difference in encoding support between the client and server.[113] Although the same security concern (allowing sequences of ASCII bytes to be interpreted differently) also applies to ISO-2022-JP andUTF-16, they could not be given this treatment due to being much more frequently used in deployed content.[111]
In April 2024, a security flaw[137] was found in the implementation of ISO-2022-CN-EXT inglibc, which lead to recommendations to disable the encoding entirely on Linux systems.[138]
A subset of ISO 2022 applied to 8-bit single-byte encodings is defined byISO/IEC 4873, also published byEcma International as ECMA-43.ISO/IEC 8859 defines 8-bit codes for ISO/IEC 4873 (or ECMA-43) level 1.[9][10]
ISO/IEC 4873 / ECMA-43 defines three levels of encoding:[139]
Earlier editions of the standard permitted non-ASCII assignments in the G0 set, provided that theISO/IEC 646 invariant positions were preserved, that the other positions were assigned to spacing (not combining) characters, that 0x23 was assigned to either£ or#, and that 0x24 was assigned to either$ or¤.[140] For instance, the 8-bit encoding ofJIS X 0201 is compliant with earlier editions. This was subsequently changed to fully specify the ISO/IEC 646:1991 IRV / ISO-IR No. 6 set (ASCII).[141][142][143]
The use of theISO/IEC 646 IRV (synchronised with ASCII since 1991) at ISO/IEC 4873 Level 1 with no C1 or G1 set, i.e. using the IRV in an 8-bit environment in which shift codes are not used and the high bit is always zero, is known asISO 4873 DV, in which DV stands for "Default Version".[144]
In cases where duplicate characters are available in different sets, the current edition of ISO/IEC 4873 / ECMA-43 only permits using these characters in the lowest numbered working set which they appear in.[145] For instance, if a character appears in both the G1 set and the G3 set, it must be used from the G1 set. However, use from other sets is noted as having been permitted in earlier editions.[143]
ISO/IEC 8859 defines complete encodings at level 1 of ISO/IEC 4873, and does not allow for use of multiple ISO/IEC 8859 parts together. It stipulates thatISO/IEC 10367 should be used instead for levels 2 and 3 of ISO/IEC 4873.[9][10] ISO/IEC 10367:1991 includes G0 and G1 sets matching those used by the first 9 parts of ISO/IEC 8859 (i.e. those which existed as of 1991, when it was published), and some supplementary sets.[146]
Character set designation escape sequences are used for identifying or switching between versions during information interchange only if required by a further protocol, in which case the standard requires an ISO/IEC 2022 announcer sequence specifying the ISO/IEC 4873 level, followed by a complete set of escapes specifying the character set designations for C0, C1, G0, G1, G2 and G3 respectively (but omitting G2 and G3 designations for level 1), with anF-byte of 0x7E denoting an empty set. Each ISO/IEC 4873 level has its own single ISO/IEC 2022 announcer sequence, which are as follows:[147]
Code | Hex | Announcement |
ESC SP L | 1B 20 4C | ISO 4873 Level 1 |
ESC SP M | 1B 20 4D | ISO 4873 Level 2 |
ESC SP N | 1B 20 4E | ISO 4873 Level 3 |
Extended Unix Code (EUC) is an 8-bit variable-widthcharacter encoding system used primarily forJapanese,Korean, andsimplified Chinese. It is based on ISO 2022, and only character sets which conform to the ISO 2022 structure can have EUC forms. Up to four coded character sets can be represented (in G0, G1, G2 and G3). The G0 set is invoked over GL, the G1 set is invoked over GR, and the G2 and G3 sets are (if present) invoked using the single shifts SS2 and SS3, which are used as CR bytes (i.e. 0x8E and 0x8F respectively) and invoke over GR (not GL).[11] Locking shift codes are not used.[12]
The code assigned to the G0 set is ASCII, or the country's nationalISO 646 character set such as KS-Roman (KS X 1003) orJIS-Roman (the lower half ofJIS X 0201).[11] Hence, 0x5C (backslash in US-ASCII) is used to represent aYen sign in some versions of EUC-JP and aWon sign in some versions of EUC-KR.
G1 is used for a 94x94 coded character set represented in two bytes. TheEUC-CN form ofGB 2312 andEUC-KR are examples of such two-byte EUC codes.EUC-JP includes characters represented by up to three bytes (i.e. SS3 plus two bytes) whereas a single character inEUC-TW can take up to four bytes (i.e. SS2 plus three bytes).
The EUC code itself does not make use of the announcer or designation sequences from ISO 2022; however, it corresponds to the following sequence of four announcer sequences, with meanings breaking down as follows.[148]
Individual sequence | Hexadecimal | Feature of EUC denoted |
ESC SP C | 1B 20 43 | ISO-8 (8-bit, G0 in GL, G1 in GR) |
ESC SP Z | 1B 20 5A | G2 accessed using SS2 |
ESC SP [ | 1B 20 5B | G3 accessed using SS3 |
ESC SP \ | 1B 20 5C | Single-shifts invoke over GR |
TheX Consortium defined an ISO 2022 profile named Compound Text as an interchange format in 1989.[149] This uses only four control codes:HT (0x09
), NL (newline, coded asLF,0x0A
) andCSI (in its 8-bit representation0x9B
),[150] with the SDS(CSI … ]
) CSI sequence being used for bidirectional text control.[151] It is an 8-bit code using G0 and G1 for GL and GR, and followsISO-8859-1 in its initial state.[152] The following F-bytes are used:
Escape sequence type | Final byte | Graphical set |
GZD4, G1D4 (for 94-character sets) | B (0x42 ) | ASCII |
I (0x49 ) | JIS X 0201 katakana | |
J (0x4A ) | JIS X 0201 Roman | |
G1D6 (for 96-character sets) | A (0x41 ) | ISO-8859-1 high part |
B (0x42 ) | ISO-8859-2 high part | |
C (0x43 ) | ISO-8859-3 high part | |
D (0x44 ) | ISO-8859-4 high part | |
F (0x46 ) | ISO-8859-7 high part | |
G (0x47 ) | ISO-8859-6 high part | |
H (0x48 ) | ISO-8859-8 high part | |
L (0x4C ) | ISO-8859-5 high part | |
M (0x4D ) | ISO-8859-9 high part | |
GZDM4, G1DM4 (for 2-byte sets) | A (0x41 ) | GB 2312 |
B (0x42 ) | JIS X 0208 | |
C (0x43 ) | KS C 5601 |
For specifying encodings by labels, X11 Compound Text defines five private-use DOCS sequences:ESC % / 0
(1B 25 2F 30
) for variable-length encodings, andESC % / 1
throughESC % / 4
for fixed-length encodings using one through four bytes respectively. Rather than using another escape sequence to return toISO 2022, the two bytes following the initial escape sequence specify the remaining length in bytes, coded in base-128 using bytes0x80–FF
. The encoding label is included inISO 8859-1 before the encoded text, and terminated withSTX (0x02
), 0x41 (A
) and 0x42 (B
) only, for historical reasons.[89] Some implementations, such as theSoftBank 2Gemoji encoding, use additional escapes of this form for non-ISO-2022-compliant purposes.[96]ESC ,F
below for background.ESC 0x1B 0x2C
sequence was defined in early editions of the standard as designating further 94-character sets to G0.[98] Since 96-character sets cannot be designated to G0, this firstI byte is not used by the current edition of the standard. However, it is still listed byMARC-8.[3]ESC ( H
to switch to ASCII from a DBCS.ESC 2/8 4/10
.ESC ( J
: CS1 maint: numeric names: authors list (link){{cite book}}
: CS1 maint: numeric names: authors list (link){{cite book}}
: CS1 maint: numeric names: authors list (link){{cite book}}
: CS1 maint: numeric names: authors list (link){{citation}}
: CS1 maint: numeric names: authors list (link){{citation}}
: CS1 maint: numeric names: authors list (link){{citation}}
: CS1 maint: numeric names: authors list (link){{citation}}
: CS1 maint: numeric names: authors list (link){{citation}}
: CS1 maint: numeric names: authors list (link)