MIME / IANA | iso-ir-165 |
---|---|
Alias(es) | CN-GB-ISOIR165 (EUC form)[1] |
Language(s) | Simplified Chinese,English,Russian Partial support: Greek,Japanese |
Standard | ITU T.101, annex C |
Definitions | ISO-IR 165 |
Extends | GB 2312 |
Encoding formats | ISO-2022-CN-EXT,Videotex Data Syntax 2 |
Succeeded by | GB 18030 |
TheCCITT Chinese Primary Set[2] is a multi-byte graphiccharacter set forChinese communications created for theConsultative Committee on International Telephone and Telegraph (CCITT) in 1992.[3] It is defined inITU T.101, annex C, which codifies Data Syntax 2Videotex.[2] It is registered with theISO-IR registry for use withISO/IEC 2022 asISO-IR-165,[4] and encodable in theISO-2022-CN-EXT code version.[1]
It is an extended modification ofGB/T 2312-80, and corresponds to the union of the mainland ChineseGB standardsGB 6345.1-86 andGB 8565.2-88, with some further modification and extensions. A subset of the GB 6345.1 extensions are incorporated intoGB 18030, while GB 8565.2 serves as the mainland Chinese source reference for certainCJK Unified Ideographs.
GB 6345.1-86 (32 × 32 Dot Matrix Font Set of Chinese Ideographs for Information Interchange) includes both acorrigendum and an extension for GB 2312.[3] The corrigendum alters the following two characters:
Row-cell | EUC | GB 2312 (Unamended)[5] | GB 6345.1 | Notes |
---|---|---|---|---|
03-71 | 0xA3E7 | ![]() | ɡ | [a] |
79-81 | 0xEFF1 | 鍾 | 锺 | [b] |
Deployed implementations incorporating GB 2312, such asWindows code page 936, generally follow these corrections in mapping 79-81 to U+953A.[7]
The extension adds half-widthISO 646-CN characters in row 10 (in addition to the existing full-width characters in row 3) and extends the set of 26 non-ASCIIpinyin characters in row 8 with six additional such characters. These GB 6345.1 extensions are also incorporated intoGB/T 12345, theTraditional Chinese counterpart to GB 2312, in addition to 29 vertical presentation forms in row 6.[3][8]
Later GB/T 6345.1-2010 published in 2011 officially adds half-width forms of the 32 pinyin characters (including the six new additions) in row 8 to row 11.[9] This addition is not featured in GB 18030.[6]
The six additional pinyin characters from GB 6345.1 and the vertical presentation forms from GB 12345 — but not the half-width forms — are included in theclassic Mac OS encoding for Simplified Chinese (a modification ofEUC-CN),[10] and also as two-byte codes inGB 18030.[6] The additional pinyin characters are as follows:[10]
Row-cell | EUC | Character[10][6] | Notes |
---|---|---|---|
08-27 | 0xA8BB | U+0251 ɑLATIN SMALL LETTER ALPHA | |
08-28 | 0xA8BC | U+1E3F ḿLATIN SMALL LETTER M WITH ACUTE | [a] |
08-29 | 0xA8BD | U+0144 ńLATIN SMALL LETTER N WITH ACUTE | |
08-30 | 0xA8BE | U+0148 ňLATIN SMALL LETTER N WITH CARON | |
08-31 | 0xA8BF | U+01F9 ǹLATIN SMALL LETTER N WITH GRAVE | [b] |
08-32 | 0xA8C0 | U+0261 ![]() | [c] |
These extensions and modifications to GB 2312 were first introduced in GB 5007.1-85 in 1985.
GB 8565.2-88 (Information Processing - Coded Character Sets for Text Communication - Part 2: Graphic Characters) defines an extension for GB 2312, adding 705 characters between rows 13–15 and 90–94, of which 69 (all in row 15) are non-hanzi. It includes the GB 2312 corrections from GB 6345.1, but not its extensions.[3]
TheUnihan database references GB 8565.2 as the mainland Chinese source of several hanzi included inUnicode. Its Unihan source abbreviation isG8
.[2]
ISO-IR-165 incorporates the GB 2312 extensions from both GB 6345.1-86 and GB 8565.2-88.[3] Additionally, it adds 161 further characters (including 139 hanzi, identified as “general Chinese characters and variants”).[3][4] These CCITT hanzi extensions have on occasion been mistaken for standard GB 8565.2 characters, including in previous revisions of theUnihan database.[2] In total the set contains 8446 characters.
A number of patternedsemigraphic characters are included in row 6.[4] This collides with the vertical presentation forms included in other extensions such as Mac OS Simplified Chinese[10] and GB 18030.[6]
The GB 6345.1 corrections to GB 2312 are applied, but two Unicode mappings are reversed compared to other encodings which include GB 2312 with GB 6345.1 extensions. The table below shows the mappings and their corresponding glyphs includingGB 18030:
Row-cell | EUC | GB 2312 (unamended)[5] | GB 6345.1[9] | GB 6345.1 mapping[10] | ISO-IR-165[4] | ISO-IR-165 mapping[13] | GB 18030[6] | GB 18030 mapping[6] |
---|---|---|---|---|---|---|---|---|
03-71 | 0xA3E7 | ![]() | ɡ | U+FF47 | ɡ | U+0261 | ![]() | U+FF47 |
08-32 | 0xA8C0 | (absent) | ![]() | U+0261 | ![]() | U+FF47 | ɡ | U+0261 |
79-81 | 0xEFF1 | 鍾 | 锺 | U+953A | 锺 | U+953A | 锺 | U+953A |
{{cite book}}
:|work=
ignored (help){{cite book}}
: CS1 maint: location missing publisher (link)