Movatterモバイル変換

CNS 11643

From Wikipedia, the free encyclopedia

National standard coded character set of the Republic of China (Taiwan)

CNS 11643
Alias(es)	CSIC (Chinese Standard Interchange Code)
Language	Traditional Chinese
Standard	CNS 11643
Classification	ISO 2022,DBCS,CJK encoding
Encoding formats	EUC-TW (all planes) ISO-2022-CN-EXT (planes 1–7) ISO-2022-CN (planes 1 and 2) MS-20000 (planes 1 and 2) Big5 (planes 2 and most of 1)
Other related encodings	Big5,CCCII

TheCNS 11643 character set (Chinese National Standard 11643), also officially known as theChinese Standard Interchange Code orCSIC^[1] (Chinese:中文標準交換碼), is officially the standard character set ofTaiwan (Republic of China). Published and draft editions of CNS 11643 remain the source standards forUnicode reference glyphs forCJK Unified Ideographs submitted for use in Taiwan,^[2] and thecharacter repertoire of CNS 11643 continues to be updated and used for administrative purposes in Taiwan.^[3]

EUC-TW is an encoded representation of CNS 11643 andASCII in Extended Unix Code (EUC) form. In practice, variants of theBig5 character set, which is closely related to the first two planes of CNS 11643, served as thede facto standard encoding forTraditional Chinese before the introduction of Unicode. Other encodings capable of representing certain CSIC planes includeISO-2022-CN (planes 1 and 2) andISO-2022-CN-EXT (planes 1 through 7).

Structure

[edit]

CNS 11643 is designed to conform toISO 2022, although only the first seven 94×94-character planes haveISO-IR registrations. The total number of planes has varied with successive revisions of the standard; the most recent pending drafts have 19 planes,^[2] so the maximum possible number of encodable characters across all planes is 19×94×94 = 167884. Planes 1 through 7 are defined by the standard; since 2007, planes 10 through 15 have also been defined by the standard.^[4]^{: 115–122} Prior to this, planes 12 to 15 (35344 code points) were specifically designated for user-defined characters.^{[citation needed]} UnlikeCCCII, the encoding of variant characters in CNS 11643 is not related.

History

[edit]

The first edition of the standard was published in 1986, and included planes 1 and 2, deriving from levels 1 and 2 ofBig5, with some re-ordering due to corrected stroke counts, two duplicate characters being omitted, and the addition of 213 classicalradicals in plane 1 (out of 214Kangxi radicals, of which 210 are effectively duplicates of existing Big5 characters and the remaining three ofHKSCS characters;^[5] see alsoKangxi Radicals (Unicode block)). Extensions to the standard were subsequently published in 1988 (6319 characters, occupying plane 14) and 1990 (7169 characters, occupying plane 15).^[4]^{: 115–122}

Unicode 1.0.0, although it did not yet includehanzi, included characters for compatibility with CNS 11643: theCJK Compatibility Forms block was titled "CNS 11643 Compatibility" in Unicode 1.0.0.^[6] When the UnicodeCJK Unified Ideographs set was being compiled for Unicode 1.0.1, the national bodies submitted character sets to theCJK Joint Research Group for inclusion. The version of CNS 11643 submitted included the plane 14 extension, in addition to further desired characters appended to plane 14 (after 68–21, the last used code point in the standard version of the extension).^[4]^{: 179–180}

In the second edition of the standard, published in 1992, a much larger collection ofhanzi was defined across seven planes. The majority of the 1988 plane 14 extension, comprising the 6148code points 01-01 through 66–38, was adopted as plane 3 (with the remaining 171 characters, code points 66-39 through 68–21, being instead distributed amongst plane 4). The plane 15 extension was not included, although 338 of its characters were included amongst planes 4 through 7.^[4]^{: 115–122}

The third edition of the standard, published in 2007, added theEuro sign, ideographic zero,kana and extensions to the existingbopomofo andRoman alphabet support to plane 1. It introduced planes 10 through 14, containing additional hanzi, and incorporated the existing plane 15 extension into the standard itself (with gaps left where the characters already existed in planes 4 through 7). It also added 128 further hanzi to plane 3, starting at code point 68–40,^[4]^{: 115–122} based on the additions to the version of the 1988 plane 14 which had been submitted for inclusion in Unicode.

Plane numbering

[edit]

CNS 11643 plane numbering in different editions, drafts or implementations
Plane	T1	T2	(UDC)	(IBM)	T3	TF	T4	T5	T6	T7	(Post-1992)	(Post-2007)
ISO-IR	171	172	-	-	183^[a]	-	184	185	186	187	-	-
1986 edition	1	2	12–15	-	-	-	-	-	-	-	-	-
IBM code page 964^[7]	1	2	12	13	-	-	-	-	-	-	-	-
1988 extension	1	2	12–13	-	14^[b]	-	-	-	-	-	-	-
1990 extension	1	2	12–13	-	14^[b]	15	-	-	-	-	-	-
CJK-JRG version	1	2	-	-	14^[c]	-	-	-	-	-	-	-
1992 edition	1	2	12–15	-	3^[a]	-	4	5	6	7	-	-
ICU 2000^[8]	1	2	-	-	3^[d]	9	4	5	6	7	-	-
2007 edition	1	2	-	-	3^[d]	15	4	5	6	7	8–14	-
ICU 2014^[9]	1	2	12	13	3^[d]	15	4	5	6	7	-	-
Post-2007^[2]	1	2	-	-	3^[e]	15	4	5	6	7	8–14	16–19

^^a ^b01-01—66-38 range only
^^a ^b01-01—66-38 and66-39—68-21 ranges
^01-01—66-38,66-39—68-21 and68-40—71-10 ranges
^^a ^b ^c01-01—66-38 and68-40—71-10 ranges
^01-01—66-38 and68-40—71-10 ranges, plus additions^[10]

Current purpose and relationship to Unicode

[edit]

The CNS 11643 repertoire includes characters used for administrative purposes in Taiwan, including household registration andID cards,^[3] in addition to characters used in education.^[11] In particular, characters in planes 1 and 2 are used in education.^[12] Only the characters used in education are subjected to glyph-form normalisation in CNS 11643.^[11] It continues to be expanded, with additional planes numbered up to 19 having been drafted, but not yet published as part of a CNS 11643 edition.^[2] A 2022 amendment to the 2007 edition appendedU+7934 礴CJK UNIFIED IDEOGRAPH-7934 to the end of plane 2, and corrected several glyph forms in planes 1 and 2.^[12]

Although the 1992 and 2007 editions of CNS 11643, in addition to more recent working drafts, serve as theUnihan sources for reference glyphs forCJK Unified Ideographs submitted for use in Taiwan,^[2] there remains, as of 2017^[update], several thousand CNS 11643 characters with no corresponding Unicode character, or which do not round-trip through Unicode, mostly in planes 10 through 14. These are mapped to the UnicodeSupplementary Private Use Area.^[13]

In some cases, two or more CNS 11643 characters correspond to a single UnicodeCJK Unified Ideograph. These cases are (except where covered by theCJK Compatibility Ideographs Supplement block) currently mapped to Unicode Supplementary Private Use Area code points,^[11] but the Taipei Computer Association, participating in theIdeographic Research Group, has been evaluating the feasibility of registering them asIdeographic Variation Sequences at some point in the future.^[11]^[14]

Relationship to Big5

[edit]

Levels 1 and 2 of theBig5 encoding correspond mostly to CNS 11643 planes 1 and 2, respectively, with occasional differences in order, and with two duplicate hanzi existing in Big5 but not in CNS 11643. They can be mapped using a list of ranges.^[15]^[16] However, the 213 classical radicals in CNS 11643 plane 1 are additional to the characters available in Big5 (although they can be lossily mapped to the correspondinghanzi characters in Big5 or HKSCS),^[5] and further additional characters were added to CNS 11643 plane 1 in 2007.^[4]^{: 115–122} TheBig5-2003 variant of Big5 is defined as a partial encoding of CNS 11643.

Within the Big5 hanzi repertoire, only one plane 1 character is conventionally mapped to Unicode differently from the corresponding character from the first two CNS 11643 planes: to U+5F5D (彝), whereas its CNS plane 1 counterpart is mapped to arelated variant at U+5F5E (彞);^[17] U+5F5D is separately included in CNS 11643 plane 3.^[5] However, some variant mappings for Big5, such as some defined byIBM, include U+5F5E rather than U+5F5D.^[18] Similarly, a single character from Big5 level 2 (including its IBM variant)^[19] is mapped to a different Unicode code point than its CNS 11643 plane 2 counterpart: to U+5284 (劄), while theUnihan database currently maps the CNS 11643 character to U+7B9A (箚); U+5284 appears in CNS 11643 plane 14.^[5]

References

[edit]

This page is based on the information on theCNS official web site.

^ECMA (1993-01-21).Chinese Standard Interchange Code (CSIC) - Set 1(PDF). ITSCJ/IPSJ.ISO-IR-171.
^^a ^b ^c ^d ^eLunde, Ken; Cook, Richard (2024-07-31)."kIRG_TSource".Unicode Han Database (Unihan) (Unicode Standard Annex). Revision 37.Unicode Consortium. UAX #38.
^^a ^b"TCA's submission for the CJK extension IRG Working Set 2021"(PDF). 2021-05-07.ISO/IEC JTC1/SC2/WG2/IRG N2486.
^^a ^b ^c ^d ^e ^fLunde, Ken (2008). "3. Character Set Standards".CJKV Information Processing (2nd ed.).O'Reilly Media.ISBN 9780596514471.
^^a ^b ^c ^dLunde, Ken (2022-11-30)."Proposal to enhance the provisional kBigFive property"(PDF).UTC L2/22-288.
^"3.8: Block-by-Block Charts"(PDF).The Unicode Standard. version 1.0.Unicode Consortium.
^"IBM-964_P110-1999".ICU Data Repository.IBM/Unicode Consortium. 2009 [1999].
^Viswanadha, Raghuram (2003) [2000-08-30]."CNS-11643-1992".International Components for Unicode.IBM/Unicode Consortium.
^"EUC-TW-2014: Update of EUC-TW based on IBM-964".International Components for Unicode.IBM/Unicode Consortium. 2014.
^e.g."Unihan data for U+2E83A",Unihan Database Lookup,Unicode Consortium has the source referenceT3-6734, i.e. plane 3 code point71-20.
^^a ^b ^c ^d"4. About glyph normalization"(PDF).Response to normalization and meaning issues on TCA characters in WS2021. 2022-03-14. pp. 3–5.ISO/IEC JTC1/SC2/WG2/IRG N2546.
^^a ^b"T-Source Glyph Correction and Horizontal Extension"(PDF). 2022-10-18.ISO/IEC JTC1/SC2/WG2/IRG N2580.
^"CNS 11643 in Unicode's Supplementary Private Use Area".[chinese mac]. Council on East Asian Studies at Yale University.
^Taipei Computer Association (2021-09-10)."Activity Report from TCA"(PDF).ISO/IEC JTC1/SC2/WG2/IRG N2502.
^Lunde, Ken (1995-12-18). "4.3: CJK Character Set Compatibility Issues - Chinese (Taiwan)".CJK.INF Version 1.9.
^Zhu, HF.; Hu, DY.; Wang, ZG.; Kao, TC.; Chang, WCH.; Crispin, M. (1996)."RFC 1922: Chinese Character Encoding for Internet Messages".Requests for Comments.IETF.
^Lunde, Ken (2018-02-15)."Exploring IICore—Part 4".CJK Type Blog.Adobe Inc.
^"ibm-950_P110-1999 (lead byte 0xC2)".International Components for Unicode Converter Explorer.Unicode Consortium. Archived fromthe original on 2021-07-12.
^"ibm-950_P110-1999.ucm".ICU Data Repository.IBM/Unicode Consortium. 2007.<U5284> \xE3\x5A |0

External links

[edit]

CNS 11643 official web site
Current CNS 11643 open data, including mapping data
Unicode Consortium mappings for CNS 11643-1986: planes 1 and 2, plus the 1988 plane 14 (not the 2007 plane 14) with extensions. Uses a single prefixed hex digit to indicate plane.
CNS 11643 mappings fromInternational Components for Unicode (ICU):
- "CNS-11643-1992":original version,current version. The original version of the mapping includes standard planes 1–7 but includes the plane 15 layout as plane 9; the current version includes only planes 1 and 2. Uses prefixed 0x81 through 0x89 to indicate plane.
- "EUC-TW-2014": standard assignments for planes 1 through 7 and 15, and IBM corporate assignments in planes 12 and 13. CNS codes in EUC format with two-byte plane 1.
ISO-IR registered CNS-11643 code charts:plane 1,plane 2,plane 3,plane 4,plane 5,plane 6,plane 7

v t e Character encodings
Early telecommunications	Telegraph code Needle Morse Non-Latin Wabun/Kana Chinese Cyrillic Baudot and Murray Fieldata ASCII ISO/IEC 646 BCDIC Teletex andVideotex/Teletext T.51/ISO/IEC 6937 ITU T.61 ITU T.101 World System Teletext background sets Transcode
ISO/IEC 8859	Approved parts -1 (Western Europe) -2 (Central Europe) -3 (Maltese/Esperanto) -4 (North Europe) -5 (Cyrillic) -6 (Arabic) -7 (Greek) -8 (Hebrew) -9 (Turkish) -10 (Nordic) -11 (Thai) -13 (Baltic) -14 (Celtic) -15 (New Western Europe) -16 (Romanian) Abandoned parts -12 (Devanagari) Proposed but not approved KOI-8 Cyrillic Sámi Adaptations Welsh Estonian Ukrainian Cyrillic
Bibliographic use	MARC-8 ANSEL CCCII/EACC ISO 5426 5426-2 5427 5428 6438 6862
National standards	ArmSCII Big5 BraSCII BSCII CNS 11643 DIN 66003 ELOT 927 GOST 10859 GB 2312 GB 12345 GB 12052 GB 18030 HKSCS ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KOI-7 KPS 9566 KS X 1001 KS X 1002 LST 1564 LST 1590-4 PASCII Shift JIS SI 960 TIS-620 TSCII VISCII VSCII YUSCII
ISO/IEC 2022	ISO/IEC 8859 ISO/IEC 10367 Extended Unix Code / EUC
Mac OSCode pages ("scripts")	Armenian Arabic Barents Cyrillic Celtic Central European Croatian Cyrillic Devanagari Font X (Kermit) Gaelic Georgian Greek Gujarati Gurmukhi Hebrew Iceland Inuit Keyboard Latin (Kermit) Maltese/Esperanto Ogham Roman Romanian Sámi Turkish Turkic Cyrillic Ukrainian VT100
DOS code pages	437 737 850 858 861 862 863 864 865 866 867 868 869 899 904 932 936 942 949 950 951 1040 1043 1046 1098 1115 1116 1117 1118 1127 ABICOMP CS Indic CSX Indic CSX+ Indic CWI-2 Iran System Kamenický Mazovia MIK
IBM AIX code pages	895 896 912 915 921 922 1006 1008 1009 1010 1012 1013 1014 1015 1016 1017 1018 1019 1046 1133
Windows code pages	CER-GS 932 936 (GBK) 950 Extended Latin-8 1250 1251 1252 1253 1254 1255 1256 1257 1258 1270 Cyrillic + French Cyrillic + German Polytonic Greek
EBCDIC code pages	Japanese language in EBCDIC DKOI
DEC terminals (VTx)	Multinational (MCS) National Replacement (NRCS) French Canadian Swiss Spanish United Kingdom Dutch Finnish French Norwegian and Danish Swedish Norwegian and Danish (alternative) 8-bit Greek 8-bit Turkish SI 960 Hebrew Special Graphics Technical (TCS)
Platform specific	1052 1053 1054 1055 1058 Acorn RISC OS Amstrad CPC Apple II ATASCII Atari ST BICS Casio calculators CDC Compucolor 8001 Compucolor II CP/M+ DEC RADIX 50 DEC MCS/NRCS DG International Galaksija GEM GSM 03.38 HP Roman HP FOCAL HP RPL SQUOZE LICS LMBCS MSX NEC APC NeXT PETSCII PostScript Standard PostScript Latin 1 SAM Coupé Sega SC-3000 Sharp calculators Sharp MZ Sinclair QL Teletext TI calculators TRS-80 Ventura International WISCII XCCS ZX80 ZX81 ZX Spectrum
Unicode /ISO/IEC 10646	UTF-1 UTF-7 UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 DIN 91379 BOCU-1 CESU-8 SCSU TACE16 Comparison of Unicode encodings
TeX typesetting system	Cork LY1 OML OMS OT1
Miscellaneous code pages	ABICOMP ASMO 449 Digital encoding of APL symbols ISO-IR-68 ARIB STD-B24 Fieldata HZ IEC-P27-1 INIS 7-bit 8-bit ISO-IR-169 ISO 2033 KOI KOI8-R KOI8-RU KOI8-U Mojikyō SEASCII Stanford/ITS Symbol TRON Unified Hangul Code
Control character	Morse prosigns C0 and C1 control codes ISO/IEC 6429 JIS X 0211 Unicode control, format and separator characters Whitespace characters
Related topics	CCSID Character encodings in HTML Charset detection Han unification Hardware code page MICR code Mojibake Variable-length encoding
Character sets

Retrieved from "https://en.wikipedia.org/w/index.php?title=CNS_11643&oldid=1265180502"

Category:

Chinese character encodings

Hidden categories:

[8]ページ先頭