Documentation Home
MySQL 8.0 Reference Manual
Related Documentation Download this Manual
PDF (US Ltr) - 43.3Mb
PDF (A4) - 43.4Mb
Man Pages (TGZ) - 297.2Kb
Man Pages (Zip) - 402.4Kb
Info (Gzip) - 4.3Mb
Info (Zip) - 4.3Mb
Excerpts from this Manual

MySQL 8.0 Reference Manual  / ...  / Character Sets, Collations, Unicode  / Unicode Support  /  The utf16 Character Set (UTF-16 Unicode Encoding)

12.9.5 The utf16 Character Set (UTF-16 Unicode Encoding)

Theutf16 character set is theucs2 character set with an extension that enables encoding of supplementary characters:

  • For a BMP character,utf16 anducs2 have identical storage characteristics: same code values, same encoding, same length.

  • For a supplementary character,utf16 has a special sequence for representing the character using 32 bits. This is called thesurrogate mechanism: For a number greater than0xffff, take 10 bits and add them to0xd800 and put them in the first 16-bit word, take 10 more bits and add them to0xdc00 and put them in the next 16-bit word. Consequently, all supplementary characters require 32 bits, where the first 16 bits are a number between0xd800 and0xdbff, and the last 16 bits are a number between0xdc00 and0xdfff. Examples are in Section15.5 Surrogates Area of the Unicode 4.0 document.

Becauseutf16 supports surrogates anducs2 does not, there is a validity check that applies only inutf16: You cannot insert a top surrogate without a bottom surrogate, or vice versa. For example:

INSERT INTO t (ucs2_column) VALUES (0xd800); /* legal */INSERT INTO t (utf16_column)VALUES (0xd800); /* illegal */

There is no validity check for characters that are technically valid but are not true Unicode (that is, characters that Unicode considers to beunassigned code points orprivate use characters or evenillegals like0xffff). For example, sinceU+F8FF is the Apple Logo, this is legal:

INSERT INTO t (utf16_column)VALUES (0xf8ff); /* legal */

Such characters cannot be expected to mean the same thing to everyone.

Because MySQL must allow for the worst case (that one character requires four bytes) the maximum length of autf16 column or index is only half of the maximum length for aucs2 column or index. For example, the maximum length of aMEMORY table index key is 3072 bytes, so these statements create tables with the longest permitted indexes forucs2 andutf16 columns:

CREATE TABLE tf (s1 VARCHAR(1536) CHARACTER SET ucs2) ENGINE=MEMORY;CREATE INDEX i ON tf (s1);CREATE TABLE tg (s1 VARCHAR(768) CHARACTER SET utf16) ENGINE=MEMORY;CREATE INDEX i ON tg (s1);