Movatterモバイル変換

Shift JIS

From Wikipedia, the free encyclopedia

Japanese character encoding

Shift JIS
MIME / IANA	Shift_JIS
Alias(es)	MS_Kanji,^[1] PCK^[2]^[3]
Language(s)	PrimarilyJapanese, but also supportingEnglish,Russian,Bulgarian,Greek
Standard	JIS X 0208:1997 Appendix 1
Classification	Extended ISO 646,^[a]variable-width encoding,CJK encoding
Extends	JIS X 0201 8-bit format
Transforms / Encodes	JIS X 0208
Succeeded by	Shift_JIS-2004 (JIS) Windows-31J (web)
v t e

Shift JIS (alsoSJIS,MIME nameShift_JIS, known asPCK inSolaris contexts)^[2]^[3] is acharacter encoding for theJapanese language, originally developed by theJapanese companyASCII Corporation^[b] in conjunction withMicrosoft and standardized asJIS X 0208 Appendix 1.

Shift JIS is based on character sets defined withinJIS standardsJIS X 0201:1997 (for thesingle-byte characters) andJIS X 0208:1997 (for thedouble-byte characters).

As of January 2025^[update], less than 0.05% of surveyed web pages used Shift JIS (actually decoded as its supersetWindows-31J encoding), a decline from 1.3% in July 2014.^[4] Shift JIS is the third-most declared character encoding for Japanese websites (though in effect it means its supersetWindows-31J is used, so it is third-most popular), declared by 1.0% of sites in the .jp domain, whileUTF-8 is used by 99% of Japanese websites.^[5]^[6]

Shift JIS is also sometimes used inQR codes (they are a Japanese invention also allowing UTF-8, which may though be preferred use).^[7]^[8]

Structure

[edit]

Shift JIS is an extension of the single-byte encodingJIS X 0201:1997, that uses unassigned code points inJIS X 0201 to encode the double-byteJIS X 0208:1997 character set. The lead bytes for the double-byte characters are "shifted" around the 64 halfwidthkatakana characters in the single-byte range0xA1 to 0xDF.

The single-byte characters0x00 to 0x7F match theASCII encoding, except for ayen sign (U+00A5) at 0x5C and anoverline (U+203E) at 0x7E in place of the ASCII character set's backslash and tilde respectively (these deviations from ASCII align withJIS X 0201). The single-byte characters from 0xA1 to 0xDF map to the half-width katakana characters found inJIS X 0201.

For double-byte characters, the first byte is always in the range 0x81 to 0x9F or the range 0xE0 to 0xEF (these ranges are unassigned inJIS X 0201). If the first byte is odd, the second byte must be in the range 0x40 to 0x9E (but cannot be 0x7F); if the first byte is even, the second byte must in the range 0x9F to 0xFC.

Shift JIS only guarantees that the first byte of two-byte characters will be high-bit-set (0x80–0xFF); the value of the second byte can be either high or low. The appearance of byte values 0x40–0x7E as second bytes ofcode words makes reliable Shift JIS detection difficult, because the same codes are used for ASCII characters. Since the same byte value can be either first or second byte, string searches are difficult, since simple searches can match the second byte of a character and the first byte of the next, which is not a valid Shift JIS character.String-searching algorithms must be tailor-made forShift JIS.

Compatibility

[edit]

Shift JIS is fullybackwards compatible with theJIS X 0201single-byte encoding, meaning that any validJIS X 0201 string is also a valid Shift JIS string.

Double-byte characters inJIS X 0208 need to be transformed in order to be encoded in Shift JIS. For a double-byte JIS X 0208 sequence $j_{1}j_{2}$ ,^[c] the transformation to the corresponding Shift JIS bytes $s_{1}s_{2}$ is:

s_{1}={\begin{cases}\left\lfloor {\frac {j_{1}+1}{2}}\right\rfloor +112&{\mbox{if }}33\leq j_{1}\leq 94\\\left\lfloor {\frac {j_{1}+1}{2}}\right\rfloor +176&{\mbox{if }}95\leq j_{1}\leq 126\end{cases}}

s_{2}={\begin{cases}j_{2}+31+\left\lfloor {\frac {j_{2}}{96}}\right\rfloor &{\mbox{if }}j_{1}{\mbox{ is odd }}\\j_{2}+126&{\mbox{if }}j_{1}{\mbox{ is even }}\end{cases}}

The competing 8-bit formatEUC-JP, which does not support single-byte halfwidth katakana, allows for a cleaner and more direct conversion to and from JIS X 0208code points, as all high-bit-set bytes are parts of a double-byte character and all codes from ASCII range represent single-byte characters.

Usage

[edit]

HTML written in Shift JIS can still be interpreted to some extent when incorrectly tagged as ASCII, and when the charset tag is in the top of the document itself, since the important start and end of HTML tags and fields (<,>,/,",&,;) are encoded as the same bytes as in ASCII, and those bytes do not appear in two-byte sequences.

Shift JIS can be used instring literals in programming languages such asC, but a few things must be taken into consideration. Firstly, that theescape character 0x5C, normallybackslash, is thehalf-width yen sign (¥) in Shift JIS. If the programmer is aware of this, it would be possible to useprintf("ハローワールド¥n"); (where ハローワールド isHello, world and ¥n is an escape sequence), assuming the I/O system supportsShift JIS output. Secondly, the 0x5C byte will cause problems when it appears as second byte of a two-byte character, because it will be interpreted as an escape sequence, which will mess up the interpretation, unless followed by another 0x5C.

Multiple versions

[edit]

Euler diagram comparing repertoires ofJIS X 0208,JIS X 0212,JIS X 0213,Windows-31J, the Microsoft standard repertoire andUnicode

Relationship between Shift_JIS variants on the PC and related encodings, including intersections and other subsets. Names given are descriptive.

Many different versions of Shift JIS exist. There are two areas for expansion:

Firstly, JIS X 0208 does not fill the whole 94×94 space encoded for it in Shift JIS, therefore there is room for more characters here—these are really extensions to JIS X 0208 rather than to Shift JIS itself.

Secondly, Shift JIS has more encoding space than is needed forJIS X 0201 andJIS X 0208 (see§ Shift JIS byte map below), and this space can and is used for yet more characters (as either single-byte or double-byte characters).

Windows-932 / Windows-31J

[edit]

Main article:Code page 932 (Microsoft Windows)

The most popular extension isWindows code page 932 (aCCSID also used forIBM's extension to Shift JIS), which is registered with theIANA as "Windows-31J",^[1] separately from Shift JIS. This was popularized by Microsoft, although Microsoft itself does not recognize the Windows-31J name and instead calls that variation "shift_jis".^[9]^[10] IBM's code page 943 includes the same double-byte codes as Microsoft's code page 932, while IBM's code page 932 includes fewer extensions (excluding those which Microsoft incorporates from NEC), and retains the character order from the 1978 edition of JIS X 0208, rather than implementing thecharacter variant swaps from the 1983 standard.^[11]

Windows-31J assigns 0x5C to U+005C REVERSE SOLIDUS (thebackslash), and 0x7E to U+007ETILDE, followingUS-ASCII.^[12] However, most localised fonts on Windows display U+005C as aYen sign forJIS X 0201 compatibility.^[13]^[14] It includes several extensions, namely "NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119)",^[1] in addition to setting some encoding space aside forend user definition.^[15]

Windows codepage 932 is the version used in theW3C/WHATWG encoding standard used byHTML5, which includes the "formerly proprietary extensions from IBM and NEC" from Windows-31J in its table for JIS X 0208,^[16] and also treats the label "shift_jis" interchangeably with "windows-31j" with the intent of being "compatible with deployed content".^[17]

MacJapanese

[edit]

The version of Shift-JIS originating from theclassic Mac OS (known asx-mac-japanese, Code page 10001^[9] or MacJapanese) assigned thetilde to 0x7E (followingUS-ASCII, notJIS X 0201 which assigns theoverline here), but theYen sign to 0x5C (as inJIS X 0201 and standardShift JIS). It also extendedJIS X 0201 by assigning thebackslash to 0x80 (corresponding to 0x5C in US-ASCII), thenon-breaking space to 0xA0, thecopyright sign to 0xFD, thetrademark symbol to 0xFE and the half-widthhorizontal ellipsis to 0xFF. It also added extended double byte characters; including 53 vertical presentation forms in theShift_JIS range 0xEB41–0xED96, at 84 JIS rows down from their canonical forms, and 260 special characters in the Shift_JIS range 0x8540–0x886D.^[18] This variant was introduced inKanjiTalk version 7.^[19]

However, certain Mac OS typefaces used other variants. Sai Mincho and Chu Gothic use a "PostScript" variant of MacJapanese, which included additional vertical presentation forms and a different set of extended special characters, based on theNEC special characters, some of which were only available in the printer versions of the fonts.^[18] Older versions of Maru Gothic and Hon Mincho fromSystem 7.1 encoded vertical presentation forms at 10 (not 84) JIS rows down from their canonical forms, and did not include the special character extensions, this was subsequently changed.^[18]^[20] The typical variant used with KanjiTalk version 6 placed the vertical presentation forms 10 rows down, and also used the NEC extension layout for row 13.^[21]

Shift_JISx0213 and Shift_JIS-2004

[edit]

Shift_JIS-2004
Alias(es)	Shift_JISx0213
Language(s)	Japanese,Ainu,English,Russian
Standard	JIS X 0213
Extends	Shift_JIS (1997), JIS X 0201 (8-bit)
Transforms / Encodes	JIS X 0213
Preceded by	Shift_JIS (1997)
v t e

The newerJIS X 0213 standard defines an extended variant of Shift_JIS referred to asShift_JISx0213 (in a previous version of the standard) orShift_JIS-2004. It is a superset of standard Shift JIS.^[22]

In order to represent the allocated rows on both planes of JIS X 0213, Shift_JIS-2004 uses the following method of mapping codepoints.^[23]

s_{1}={\begin{cases}\left\lfloor {\frac {k+257}{2}}\right\rfloor &{\mbox{if }}m=1{\mbox{ and }}1\leq k\leq 62\\\left\lfloor {\frac {k+385}{2}}\right\rfloor &{\mbox{if }}m=1{\mbox{ and }}63\leq k\leq 94\\\left\lfloor {\frac {k+479}{2}}\right\rfloor -\left\lfloor {\frac {k}{8}}\right\rfloor \times 3&{\mbox{if }}m=2{\mbox{ and }}k=1,3,4,5,8,12,13,14,15\\\left\lfloor {\frac {k+411}{2}}\right\rfloor &{\mbox{if }}m=2{\mbox{ and }}78\leq k\leq 94\end{cases}}

s_{2}={\begin{cases}t+63&{\mbox{if }}k{\mbox{ is odd and }}1\leq t\leq 63\\t+64&{\mbox{if }}k{\mbox{ is odd and }}64\leq t\leq 94\\t+158&{\mbox{if }}k{\mbox{ is even }}\end{cases}}

In the above, $s_{1}s_{2}$ is a two-byte Shift_JIS-2004 sequence, $m {\displaystyle m}$ is the plane (面,men, surface) number (1 or 2), $k {\displaystyle k}$ is the row (区,ku, ward) number (1-94) and $t {\displaystyle t}$ is the cell (点,ten, point) number (1-94). Theku andten numbers are equivalent to $j_{1}-32$ and $j_{2}-32$ respectively, where $j_{1}j_{2}$ is a two-byte JIS sequence referencing a given plane.

The same set of characters can be represented byEUC-JIS-2004, the EUC-JP based counterpart.

Some of the additions collide with popular Shift JIS extensions, including Windows codepage 932 which is used in web standards (seeabove). For example, compare plane 1 row 89 inJIS X 0213 (beginning 硃, 硎, 硏...)^[24] to row 89 in the JIS X 0208 variant defined in web standards (beginning 纊, 褜, 鍈...).^[25] In addition, some of the characters map to Unicode characters beyond the BMP.

Other variants

[edit]

Further information:Implementation of emojis § JIS, Shift_JIS and Private Use Area encodings

The space with lead bytes 0xF5 to 0xF9 (beyond the region used for JIS X 0208) is used by Japanesemobile phone operators forpictographs for use inE-mail.^[26]KDDI goes further and defines hundreds more in the space with lead bytes 0xF3 and 0xF4.^[27]

Beyond even this, there have been numerous minor variations made on Shift JIS, with individual characters here and there altered. Most of these extensions and variants have noIANA registration, so there is much scope for confusion, if the extensions are used.

A variant is the one that must be used if wanting to encode Shift JIS in source codestrings ofC and similar programming languages. This variant doubles the byte 0x5C if it appears as second byte of a two-byte character, but not if it appears as a single "¥" (ASCII: "\") character, because 0x5C is the beginning of anescape sequence. The best way of handling this is a special editor which encodesShift JIS this way.

Shift JIS byte map

[edit]

As defined in JIS X 0208:1997

[edit]

The chart below gives the detailed meaning of each byte in a stream encoded in standardShift JIS (conforming toJIS X 0208:1997).

First byte
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0	␀	␁	␂	␃	␄	␅	␆	␇	␈	␉	␊	␋	␌	␍	␎	␏
1	␐	␑	␒	␓	␔	␕	␖	␗	␘	␙	␚	␛	␜	␝	␞	␟
2	␠	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
3	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5	P	Q	R	S	T	U	V	W	X	Y	Z	[	¥	]	^	_
6	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	‾	␡
8
9
A		｡	｢	｣	､	･	ｦ	ｧ	ｨ	ｩ	ｪ	ｫ	ｬ	ｭ	ｮ	ｯ
B	ｰ	ｱ	ｲ	ｳ	ｴ	ｵ	ｶ	ｷ	ｸ	ｹ	ｺ	ｻ	ｼ	ｽ	ｾ	ｿ
C	ﾀ	ﾁ	ﾂ	ﾃ	ﾄ	ﾅ	ﾆ	ﾇ	ﾈ	ﾉ	ﾊ	ﾋ	ﾌ	ﾍ	ﾎ	ﾏ
D	ﾐ	ﾑ	ﾒ	ﾓ	ﾔ	ﾕ	ﾖ	ﾗ	ﾘ	ﾙ	ﾚ	ﾛ	ﾜ	ﾝ	ﾞ	ﾟ
E
F

Second byte
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F

	Non printable ASCII character
	Unaltered ASCII character
	Modified ASCII character
	Single-byte half-width katakana
	First byte of a double-byte JIS X 0208 character
	Unused as first byte of a JIS X 0208 character
	Second byte of a double-byte JIS X 0208 character whose first half of the JIS sequence was odd
	Second byte of a double-byte JIS X 0208 character whose first half of the JIS sequence was even
	Unused as second byte of a JIS X 0208 character

With vendor or JIS X 0213 extensions

[edit]

Some of the bytes which are not used for single-byte codes or initial bytes inJIS X 0208:1997 are used by certain extensions, resulting in the layout detailed in the chart below.

First byte
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0	␀	␁	␂	␃	␄	␅	␆	␇	␈	␉	␊	␋	␌	␍	␎	␏
1	␐	␑	␒	␓	␔	␕	␖	␗	␘	␙	␚	␛	␜	␝	␞	␟
2	␠	!	"	#	$	%	&	'	(	)	*	+	,	-	.	/
3	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5	P	Q	R	S	T	U	V	W	X	Y	Z	[	¥	]	^	_
6	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	‾	␡
8
9
A		｡	｢	｣	､	･	ｦ	ｧ	ｨ	ｩ	ｪ	ｫ	ｬ	ｭ	ｮ	ｯ
B	ｰ	ｱ	ｲ	ｳ	ｴ	ｵ	ｶ	ｷ	ｸ	ｹ	ｺ	ｻ	ｼ	ｽ	ｾ	ｿ
C	ﾀ	ﾁ	ﾂ	ﾃ	ﾄ	ﾅ	ﾆ	ﾇ	ﾈ	ﾉ	ﾊ	ﾋ	ﾌ	ﾍ	ﾎ	ﾏ
D	ﾐ	ﾑ	ﾒ	ﾓ	ﾔ	ﾕ	ﾖ	ﾗ	ﾘ	ﾙ	ﾚ	ﾛ	ﾜ	ﾝ	ﾞ	ﾟ
E
F

Second byte
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F

	Non printable ASCII character
	Unaltered ASCII character
	Modified ASCII character
	Single-byte half-width katakana
	First byte of a double-byte character, used by JIS X 0208 (and by extensions such as JIS X 0213 plane 1)
	First byte of a double-byte character, unallocated in JIS X 0208 but used by JIS X 0213 plane 1 or by vendor extensions
	First byte of a double-byte character beyond JIS X 0208, used for JIS X 0213 plane 2 or for unrelated extensions
	Not used as first byte, used by some single byte extensions
	Second byte of a double-byte character whose first half of the JIS sequence was odd
	Second byte of a double-byte character whose first half of the JIS sequence was even
	Unused as second byte of a double-byte character

Footnotes

[edit]

^Not in the strictest sense of the term, as ASCII bytes can appear as trail bytes.
^TheASCII Corporation should not be confused with theASCII encoding used elsewhere in this article.
^In JIS X 0208,j₁ andj₂ are each in the range 33 (0x21) to 126 (0x7e) inclusive (i.e., 7-bit character values excluding control characters (0–31 (0x1f) and 127 (0x7f)) and space).

References

[edit]

^^a ^b ^c"Character Sets". IANA.
^^a ^b"convutf8.c".OpenSolaris. Line 305. 2008-11-12.
^^a ^b"Additional Japanese iconv Modules".What's New in the Solaris 9 9/04 Operating Environment.Oracle Corporation.
^"Historical trends in the usage of character encodings for websites, January 2025".w3techs.com. Retrieved2024-01-07.
^"Distribution of Character Encodings among websites that use .jp".w3techs.com. Retrieved2024-12-10.
^"Distribution of Character Encodings among websites that use Japanese".w3techs.com. Retrieved2024-12-10.
^"Is UTF-8 the encoding of choice for QR-codes with non ASCII chars by now?".Stack Overflow. Retrieved2024-11-01.
^"QR Code features".
^^a ^b"Encoding.WindowsCodePage Property – .NET Framework (current version)".MSDN. Microsoft.
^"Code Page Identifiers".Windows Dev Center. Microsoft. 7 January 2021.
^"IBM-943 and IBM-932".IBM Knowledge Center. IBM.
^"CP932.TXT". Unicode Consortium.
^"3.1.1 Details of Problems".Problems and Solutions for Unicode and User/Vendor Defined Characters. The Open Group Japan. Archived fromthe original on 1999-02-03.
^Kaplan, Michael S. (2005-09-17)."When is a backslash not a backslash?".
^Kaplan, Michael S (2007-05-26)."The PUA outside of Unicode".Sorting it all out.
^"5. Indexes (§ Index jis0208)".Encoding Standard. WHATWG.
^"4.2. Names and labels".Encoding Standard. WHATWG.
^^a ^b ^c"JAPANESE.TXT: Map (external version) from Mac OS Japanese encoding to Unicode 2.1 and later". Apple Computer, Inc.; Unicode Consortium.
^Lunde, Ken (2019-03-21)."A Brief History of Japan's Era Name Ligatures".CJK Type Blog.Adobe Inc.
^"Encoding Variants for MacJapanese".Apple Developer Documentation. Apple.
^Lunde, Ken (2008)."Appendix E: Vendor Character Set Standards"(PDF).CJKV Information Processing.O'Reilly Media.ISBN 9780596514471.
^"JIS X 0213 Code Mapping Tables". x0213.org.
^"JIS X 0213の代表的な符号化方式 § Shift_JIS-2004" (in Japanese). Hexadecimal numbers in the source have been converted to decimal for display.
^Japanese Industrial Standards Committee (2004-04-13).Japanese Graphic Character Set for Information Interchange, Plane 1(PDF). ITSCJ/IPSJ.ISO-IR-233.
^"Index jis0208 visualization".Encoding Standard. WHATWG.
^"Original Emoji from DoCoMo". FileFormat.info.
^"Original Emoji from KDDI". FileFormat.info.

External links

[edit]

Shift-JIS Kanji Table – a table of the non-ASCII part of the codeset
"Windows Codepage 932".Microsoft. May 1, 2005. Archived fromthe original on 2008-03-07. – Microsoft's definition
Forms of Shift-JIS in ICU (International Components for Unicode)

v t e Character encodings
Early telecommunications	Telegraph code Needle Morse Non-Latin Wabun/Kana Chinese Cyrillic Baudot and Murray Fieldata ASCII ISO/IEC 646 BCDIC Teletex andVideotex/Teletext T.51/ISO/IEC 6937 ITU T.61 ITU T.101 World System Teletext background sets Transcode
ISO/IEC 8859	Approved parts -1 (Western Europe) -2 (Central Europe) -3 (Maltese/Esperanto) -4 (North Europe) -5 (Cyrillic) -6 (Arabic) -7 (Greek) -8 (Hebrew) -9 (Turkish) -10 (Nordic) -11 (Thai) -13 (Baltic) -14 (Celtic) -15 (New Western Europe) -16 (Romanian) Abandoned parts -12 (Devanagari) Proposed but not approved KOI-8 Cyrillic Sámi Adaptations Welsh Estonian Ukrainian Cyrillic
Bibliographic use	MARC-8 ANSEL CCCII/EACC ISO 5426 5426-2 5427 5428 6438 6862
National standards	ArmSCII Big5 BraSCII CNS 11643 DIN 66003 ELOT 927 GOST 10859 GB 2312 GB 12345 GB 12052 GB 18030 HKSCS ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KOI-7 KPS 9566 KS X 1001 KS X 1002 LST 1564 LST 1590-4 PASCII Shift JIS SI 960 TIS-620 TSCII VISCII VSCII YUSCII
ISO/IEC 2022	ISO/IEC 8859 ISO/IEC 10367 Extended Unix Code / EUC
Mac OSCode pages ("scripts")	Armenian Arabic Barents Cyrillic Celtic Central European Croatian Cyrillic Devanagari Farsi (Persian) Font X (Kermit) Gaelic Georgian Greek Gujarati Gurmukhi Hebrew Iceland Inuit Keyboard Latin (Kermit) Maltese/Esperanto Ogham Roman Romanian Sámi Turkish Turkic Cyrillic Ukrainian VT100
DOS code pages	437 737 850 858 861 862 863 864 865 866 867 868 869 899 904 932 936 942 949 950 951 1040 1043 1046 1098 1115 1116 1117 1118 1127 ABICOMP CS Indic CSX Indic CSX+ Indic CWI-2 Iran System Kamenický Mazovia MIK
IBM AIX code pages	895 896 912 915 921 922 1006 1008 1009 1010 1012 1013 1014 1015 1016 1017 1018 1019 1046 1133
Windows code pages	CER-GS 932 936 (GBK) 950 Extended Latin-8 1250 1251 1252 1253 1254 1255 1256 1257 1258 1270 Cyrillic + French Cyrillic + German Polytonic Greek
EBCDIC code pages	Japanese language in EBCDIC DKOI
DEC terminals (VTx)	Multinational (MCS) National Replacement (NRCS) French Canadian Swiss Spanish United Kingdom Dutch Finnish French Norwegian and Danish Swedish Norwegian and Danish (alternative) 8-bit Greek 8-bit Turkish SI 960 Hebrew Special Graphics Technical (TCS)
Platform specific	1052 1053 1054 1055 1058 Acorn RISC OS Amstrad CPC Apple II ATASCII Atari ST BICS Casio calculators CDC Compucolor 8001 Compucolor II CP/M+ DEC RADIX 50 DEC MCS/NRCS DG International Galaksija GEM GSM 03.38 HP Roman HP FOCAL HP RPL SQUOZE LICS LMBCS MSX NEC APC NeXT PETSCII PostScript Standard PostScript Latin 1 SAM Coupé Sega SC-3000 Sharp calculators Sharp MZ Sinclair QL Teletext TI calculators TRS-80 Ventura International WISCII XCCS ZX80 ZX81 ZX Spectrum
Unicode /ISO/IEC 10646	UTF-1 UTF-7 UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 DIN 91379 BOCU-1 CESU-8 SCSU TACE16 Comparison of Unicode encodings
TeX typesetting system	Cork LY1 OML OMS OT1
Miscellaneous code pages	ABICOMP ASMO 449 Digital encoding of APL symbols ISO-IR-68 ARIB STD-B24 Fieldata HZ IEC-P27-1 INIS 7-bit 8-bit ISO-IR-169 ISO 2033 KOI KOI8-R KOI8-RU KOI8-U Mojikyō SEASCII Stanford/ITS Symbol TRON Unified Hangul Code
Control character	Morse prosigns C0 and C1 control codes ISO/IEC 6429 JIS X 0211 Unicode control, format and separator characters Whitespace characters
Related topics	CCSID Character encodings in HTML Charset detection Han unification Hardware code page MICR code Mojibake Variable-length encoding
Character sets