This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Plane" Unicode – news ·newspapers ·books ·scholar ·JSTOR(July 2016) (Learn how and when to remove this message) |
In theUnicode standard, aplane is a contiguous group of 65,536 (216)code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six positionhexadecimal format (U+hhhhhh). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes".[1] The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version 16.0, five of the planes have assigned code points (characters), and seven are named.
The limit of 17 planes is due toUTF-16, which can encode 220 code points (16 planes) as pairs ofwords, plus the BMP as a single word.[2]UTF-8 was designed with a much larger limit of 231 (2,147,483,648) code points (32,768 planes), and would still be able to encode 221 (2,097,152) code points (32 planes) even under the current limit of 4bytes.[3]
The 17 planes can accommodate 1,114,112 code points. Of these, 2,048 aresurrogates (used to make the pairs in UTF-16), 66 arenon-characters, and 137,468 arereserved for private use, leaving 974,530 for public assignment.
Planes are further subdivided intoUnicode blocks, which, unlike planes, do not have a fixed size. The 338 blocks defined in Unicode 16.0 cover 27% of the possible code point space, and range in size from a minimum of 16 code points (sixteen blocks) to a maximum of 65,536 code points (Supplementary Private Use Area-A and -B, which constitute the entirety of planes 15 and 16). For future usage, ranges of characters have been tentatively mapped out for most known current and ancient writing systems.[4]
Plane | Allocated code points[note 1]version 16.0 | Assigned characters |
---|---|---|
0 BMP | 65,520 | 55,656 |
1 SMP | 31,424 | 28,444 |
2 SIP | 61,536 | 61,495 |
3 TIP | 9,136 | 9,131 |
14 SSP | 368 | 337 |
15 SPUA-A | 65,536 | 0(by definition) |
16 SPUA-B | 65,536 | 0(by definition) |
Totals | 299,056 | 155,063 |
The first plane,plane 0, theBasic Multilingual Plane (BMP), contains characters for almost all modern languages, and a large number ofsymbols. A primary objective for the BMP is to support the unification of prior character sets as well as characters forwriting. Most of the assigned code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK) characters.
The High Surrogate (U+D800–U+DBFF) and Low Surrogate (U+DC00–U+DFFF) codes are reserved forencoding non-BMP characters in UTF-16 by using apair of 16-bit codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character.
65,520 of the 65,536 code points in this plane have been allocated to a Unicode block, leaving just 16 code points in a single unallocated range (2FE0..2FEF).
As of Unicode 16.0[update], the BMP comprises the following 164 blocks:
Plane 1, theSupplementary Multilingual Plane (SMP), contains historic scripts (except CJK ideographic), and symbols and notation used within certain fields. Scripts includeLinear B,Egyptian hieroglyphs, andcuneiform scripts. It also includes English reform orthographies likeShavian andDeseret, and some modern scripts likeOsage,Warang Citi,Adlam,Wancho andToto. Symbols and notations include historic and modernmusical notation;mathematical alphanumerics; shorthands;Emoji and other pictographic sets; and game symbols forplaying cards,mahjong, anddominoes.
As of Unicode 16.0[update], the SMP comprises the following 161 blocks:
Plane 2, theSupplementary Ideographic Plane (SIP), is used for CJK Ideographs, mostlyCJK Unified Ideographs, that were not included in earlier character encoding standards.
As of Unicode 16.0[update], the SIP comprises the following seven blocks:
Plane 3 is the Tertiary Ideographic Plane (TIP).CJK Unified Ideographs Extension G was added to the TIP in Unicode 13.0, released in March 2020.[5] It also is tentatively allocated forOracle Bone script andSmall Seal Script.[6]
As of Unicode 16.0[update], the TIP comprises the following two blocks:
Planes 4 to 13 (planes4 toD inhexadecimal): No characters have yet been assigned, or proposed for assignment, to Planes 4 through 13.
Plane 14 (E inhexadecimal) is designated as theSupplementary Special-purpose Plane (SSP). It comprises the following twoblocks, as of Unicode 16.0[update]:
The twoplanes 15 and 16 (planesF and10 inhexadecimal) each contain a "Private Use Area". They contain blocks namedSupplementary Private Use Area-A (PUA-A) and-B (PUA-B). The Private Use Areas are available for use by parties outside ISO and Unicode (private use character encoding).