Windows code page 936 (abbreviatedMS936,Windows-936 or (ambiguously)CP936),[1] isMicrosoft's legacy (pre-Unicode)character encoding for representingsimplified Chinese texton computers. It is one of the four WindowsDBCSs forEast Asian languages, accompanying code pages932 (Japanese),949 (Korean) and950 (Traditional Chinese). It is a variant of theMainland ChineseGuójiā Biāozhǔn Kuòzhǎn (GBK) encoding, and roughly corresponds toIBM code page 1386 (CP1386 orIBM-1386).
MIME / IANA | GBK |
---|---|
Language(s) | Mainly used forSimplified Chinese, but also supportsTraditional Chinese,Japanese,English,Russian and (partially)Greek. |
Classification | GBK variant,Extended ASCII,[a]variable-width encoding,CJK encoding |
Extends | EUC-CN |
Based on | GBK (GB 13000.1-93 annex) |
Succeeded by | Code page 54936 (GB 18030) |
|
History
editOriginally, Windows-936 coveredGB 2312 (in itsEUC-CN form), but it was expanded to cover most ofGBK with the release ofWindows 95. TheEuro sign (€), not defined in GBK, is encoded as 0x80 in Windows-936 and IBM-1386. On the other hand, 95 characters defined in GBK 1.0 were initially not encoded into Windows-936. This is partly resolved in later versions of Windows and, as in Windows 7, all GBK characters not in the Unicode BMPPrivate Use Area can be displayed using code page 936, but encoding the 95 characters was still not supported as of 2014[update].
Windows code page 936 was superseded bycode page 54936 (GB 18030), but as of 2014[update] was still prevalent in use. TheWindows console uses code page 936 as the default code page for simplified Chinese installations, although part of the GB 18030 was made mandatory for all software products sold in China. In 2002, the IANA Internet nameGBK was registered with Windows-936's mapping,[2][3] making it thede facto GBK definition on the Internet.
Terminology
editThe name "code page 936" is ambiguous.IBM's code page 936,[4], an obsoleteIBM 5550 encoding, is also a Simplified Chinese encoding, but uses a different encoding method forGB 2312 (Shift GB), and so is entirely incompatible with Windows code page 936 (in contrast toIBM code page 932 being, to a first approximation,[a] a subset ofWindows code page 932)—althoughInternational Components for Unicode does not include an IBM-936 codec, and uses the Windows code page for thecp936
label.[1] IBM's code page for GBK coverage is code page 1386, which is defined as a combination of the single byteCode page 1114 and the double byteCode page 1385.[5]
The concepts of "Windows-936", "GBK", "GB2312" and "EUC-CN" are sometimes conflated in various software products.EUC-CN is registered with theIANA asGB2312
, although it is a specific,variable-width 8-bitstateless, encoding format ofGB 2312 (which also has other, less widely used, encoding formats such asHZ-GB-2312,ISO-2022-CN or the aforementioned Shift GB).
Since GBK is a superset of EUC-CN (although not itself an EUC code) and supersededGB 2312 long ago, and since Microsoft software continued to assign theGB2312
encoding label to code page 936 even after extending it to implement GBK rather than EUC-CN, most modern-day Windows-based software products mean partial support for GBK via Windows-936, rather than EUC-CN or other encoding formats of GB 2312, when they use the term "GB 2312" as a character encoding option. This can be observed in products such as Microsoft Internet Explorer and Notepad++.
Footnotes
edit- ^If thecharacter variant swaps from 1983 are ignored.
References
edit- ^ab"windows-936-2000 (alias cp936)".ICU Demonstration - Converter Explorer. International Components for Unicode.
- ^"Character Sets". Retrieved3 October 2016.
- ^Application of IANA Charset Registration for GBK
- ^"Coded character set identifiers - CCSID 936".IBM Globalization. IBM. Archived fromthe original on 2014-12-01.
- ^"Coded character set identifiers - CCSID 1386". IBM. Archived fromthe original on 2014-11-29.
External links
editWindows-936:
- Microsoft's reference for Windows-936
- Code page file for Windows-936
- Mapping of Windows-936 to Unicode
- ICU demonstration of Windows-936
- International Components for Unicode (ICU), windows-936-2000.ucm
IBM-1386: