Movatterモバイル変換

ISO-IR-165

From Wikipedia, the free encyclopedia

CCITT Chinese set (ISO-IR 165)
MIME / IANA	iso-ir-165
Alias(es)	`CN-GB-ISOIR165` (EUC form)^[1]
Language(s)	Simplified Chinese,English,Russian Partial support: Greek,Japanese
Standard	ITU T.101, annex C
Definitions	ISO-IR 165
Extends	GB 2312
Encoding formats	ISO-2022-CN-EXT,Videotex Data Syntax 2
Succeeded by	GB 18030
v t e

TheCCITT Chinese Primary Set^[2] is a multi-byte graphiccharacter set forChinese communications created for theConsultative Committee on International Telephone and Telegraph (CCITT) in 1992.^[3] It is defined inITU T.101, annex C, which codifies Data Syntax 2Videotex.^[2] It is registered with theISO-IR registry for use withISO/IEC 2022 asISO-IR-165,^[4] and encodable in theISO-2022-CN-EXT code version.^[1]

It is an extended modification ofGB/T 2312-80, and corresponds to the union of the mainland ChineseGB standardsGB 6345.1-86 andGB 8565.2-88, with some further modification and extensions. A subset of the GB 6345.1 extensions are incorporated intoGB 18030, while GB 8565.2 serves as the mainland Chinese source reference for certainCJK Unified Ideographs.

GB 6345.1

[edit]

GB 6345.1-86 (32 × 32 Dot Matrix Font Set of Chinese Ideographs for Information Interchange) includes both acorrigendum and an extension for GB 2312.^[3] The corrigendum alters the following two characters:

Alterations made to existing GB 2312 characters by GB 6345.1
Row-cell	EUC	GB 2312 (Unamended)^[5]	GB 6345.1	Notes
03-71	0xA3E7		ɡ	^[a]
79-81	0xEFF1	鍾	锺	^[b]

^Corresponds toU+FF47 ｇFULLWIDTH LATIN SMALL LETTER G in Unicode; however, the amended reference glyph can also correspond toU+0261 ɡLATIN SMALL LETTER SCRIPT G. See below for howU+0261 is typically mapped to/from GB/T 6341.1, versus how it is mapped to/from ISO-IR-165. GB 18030 swaps this one back to the original^[5] looped glyph.^[6]
^The unamended reference glyph is a Traditional Chinese character corresponding toU+937E. The character in question is usually replaced with钟 (U+949F, also the simplification of鐘) in Simplified Chinese except in names of persons; the amended glyph is an alternate simplified form corresponding toU+953A.

Deployed implementations incorporating GB 2312, such asWindows code page 936, generally follow these corrections in mapping 79-81 to U+953A.^[7]

The extension adds half-widthISO 646-CN characters in row 10 (in addition to the existing full-width characters in row 3) and extends the set of 26 non-ASCIIpinyin characters in row 8 with six additional such characters. These GB 6345.1 extensions are also incorporated intoGB/T 12345, theTraditional Chinese counterpart to GB 2312, in addition to 29 vertical presentation forms in row 6.^[3]^[8]

Later GB/T 6345.1-2010 published in 2011 officially adds half-width forms of the 32 pinyin characters (including the six new additions) in row 8 to row 11.^[9] This addition is not featured in GB 18030.^[6]

The six additional pinyin characters from GB 6345.1 and the vertical presentation forms from GB 12345 — but not the half-width forms — are included in theclassic Mac OS encoding for Simplified Chinese (a modification ofEUC-CN),^[10] and also as two-byte codes inGB 18030.^[6] The additional pinyin characters are as follows:^[10]

Extensions made by GB 6345.1 to GB 2312 row 8
Row-cell	EUC	Character^[10]^[6]	Notes
08-27	0xA8BB	U+0251 ɑLATIN SMALL LETTER ALPHA
08-28	0xA8BC	U+1E3F ḿLATIN SMALL LETTER M WITH ACUTE	^[a]
08-29	0xA8BD	U+0144 ńLATIN SMALL LETTER N WITH ACUTE
08-30	0xA8BE	U+0148 ňLATIN SMALL LETTER N WITH CARON
08-31	0xA8BF	U+01F9 ǹLATIN SMALL LETTER N WITH GRAVE	^[b]
08-32	0xA8C0	U+0261 LATIN SMALL LETTER SCRIPT G	^[c]

^Mapped to thePrivate Use AreaU+E7C7 byWindows code page 936^[11] and the first (2000) edition ofGB 18030; this was amended by the 2005 edition.^[6]
^This composed character was added in Unicode 3.0. Prior to this, this character was mapped to its composition sequence (i.e.U+006E U+0300) by Apple.^[10] This change predates the stabilisation ofUnicode normalisation forms, which was introduced in Unicode 3.1.^[12] It is mapped toU+E7C8 byWindows code page 936.^[11]
^Matches the unamended reference glyph for 03-71 (see above) in being a looped g, in spite of being typically mapped to U+0261. Mappings used for ISO-IR-165 differ (see below). GB 18030 swaps 03-71 back to the looped g, and makes this one the open g.^[6]

These extensions and modifications to GB 2312 were first introduced in GB 5007.1-85 in 1985.

GB 8565.2

[edit]

GB 8565.2-88 (Information Processing - Coded Character Sets for Text Communication - Part 2: Graphic Characters) defines an extension for GB 2312, adding 705 characters between rows 13–15 and 90–94, of which 69 (all in row 15) are non-hanzi. It includes the GB 2312 corrections from GB 6345.1, but not its extensions.^[3]

TheUnihan database references GB 8565.2 as the mainland Chinese source of several hanzi included inUnicode. Its Unihan source abbreviation isG8.^[2]

CCITT changes

[edit]

ISO-IR-165 incorporates the GB 2312 extensions from both GB 6345.1-86 and GB 8565.2-88.^[3] Additionally, it adds 161 further characters (including 139 hanzi, identified as “general Chinese characters and variants”).^[3]^[4] These CCITT hanzi extensions have on occasion been mistaken for standard GB 8565.2 characters, including in previous revisions of theUnihan database.^[2] In total the set contains 8446 characters.

A number of patternedsemigraphic characters are included in row 6.^[4] This collides with the vertical presentation forms included in other extensions such as Mac OS Simplified Chinese^[10] and GB 18030.^[6]

The GB 6345.1 corrections to GB 2312 are applied, but two Unicode mappings are reversed compared to other encodings which include GB 2312 with GB 6345.1 extensions. The table below shows the mappings and their corresponding glyphs includingGB 18030:

Row-cell	EUC	GB 2312 (unamended)^[5]	GB 6345.1^[9]	GB 6345.1 mapping^[10]	ISO-IR-165^[4]	ISO-IR-165 mapping^[13]	GB 18030^[6]	GB 18030 mapping^[6]
03-71	0xA3E7		ɡ	U+FF47	ɡ	U+0261		U+FF47
08-32	0xA8C0	(absent)		U+0261		U+FF47	ɡ	U+0261
79-81	0xEFF1	鍾	锺	U+953A	锺	U+953A	锺	U+953A

References

[edit]

^^a ^bZhu, HF.; Hu, DY.; Wang, ZG.; Kao, TC.; Chang, WCH.; Crispin, M. (1996)."Chinese Character Encoding for Internet Messages".Requests for Comments.IETF.doi:10.17487/rfc1922. RFC 1922.
^^a ^b ^c ^dChung, Jaemin (2018-01-24)."Pseudo-G8 characters"(PDF).ISO/IEC JTC 1/SC 2/WG 2/IRG N2276.
^^a ^b ^c ^d ^e ^fLunde, Ken (2009).CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing (2nd ed.).Sebastopol, CA:O'Reilly. pp. 94–111.ISBN 978-0-596-51447-1.
^^a ^b ^c ^dCCITT (1992-07-13).Codes of the Chinese graphic character set for communication(PDF). ITSCJ/IPSJ.ISO-IR-165.
^^a ^b ^cChina Association for Standardization.Coded Chinese Graphic Character Set for Information Interchange(PDF). ITSCJ/IPSJ.ISO-IR-58.
^^a ^b ^c ^d ^e ^f ^g ^h ⁱStandardization Administration of China (SAC) (2005-11-18).GB 18030-2005: Information Technology—Chinese coded character set.
^Steele, Shawn (2000)."cp936 to Unicode table".Microsoft,Unicode Consortium.
^Lunde, Ken (1998).Appendix F: GB/T 12345(PDF).O'Reilly Media.ISBN 9781565922242.{{cite book}}:|work= ignored (help)
^^a ^bStandardization Administration of China (SAC) (2011-01-10).GB/T 6345.1-2010 信息技术汉字编码字符集(基本集) 32点阵字型第1部分宋体 (in Chinese (China)). China.{{cite book}}: CS1 maint: location missing publisher (link)
^^a ^b ^c ^d ^e ^f"Map (external version) from Mac OS Chinese Simplified encoding to Unicode 3.0 and later".Apple, Inc.
^^a ^bMicrosoft."CODEPAGE 936: PRC GBK (XGB) - ANSI, OEM".Unicode Consortium.
^"Unicode Character Encoding Stability Policies". Unicode Consortium. 2017-06-23.
^Viswanadha, Raghuram (2000-08-30)."Unicode to ISO-IR-165 table".International Components for Unicode.IBM. (Note: codes are listed in the source in 7-bit form: add 0x80 to each byte for EUC form, or subtract 0x20 for kuten form)

External links

[edit]

ISO-IR-165: Code of the Chinese graphic character set for communication (registered 1992, amended 1994)
Unicode mappings for ISO-IR-165

Chinese, Japanese and Korean computing

Encodings

Chinese	ISO-2022-CN CNS 11643 Big5 HKSCS GB 18030 GBK GB 2312 GB/T 12345 HZ ISO-IR-165 CCCII
Japanese	ISO-2022-JP JIS JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 Shift-JIS
Korean	ISO-2022-KR KS X 1001 KS X 1002 KPS 9566 GB 12052
International	EUC ISO/IEC 2022 Unicode CJK Unified Ideographs Han unification

Input methods

Fonts

List of CJK fonts

Retrieved from "https://en.wikipedia.org/w/index.php?title=ISO-IR-165&oldid=1242668199"

Category:

Chinese character encodings

Hidden categories:

[8]ページ先頭