Unicode 15.0 Core Specification BookmarksThis page contains links to sections, tables, and figures of the core specification forThe Unicode Standard, Version 15.0. SeeUnicode 15.0.0 for full context about the Unicode Standard.PrefaceWhy Unicode?Organization of This StandardThe Unicode Character DatabaseUnicode Code ChartsUnicode Standard AnnexesUnicode Technical Standards and Unicode Technical ReportsUpdates and ErrataAcknowledgements1 IntroductionFigure 1-1. Wide ASCII1.1 CoverageStandards CoverageNew Characters1.2 Design GoalsFigure 1-2. Unicode Compared to the 2022 Framework1.3 Text HandlingCharacters and GlyphsText Elements2 General Structure2.1 Architectural ContextBasic Text ProcessesText Elements, Characters, and Text ProcessesFigure 2-1. Text Elements and CharactersText Processes and Encoding2.2 Unicode Design PrinciplesTable 2-1. The 10 Unicode Design PrinciplesUniversalityEfficiencyCharacters, Not GlyphsFigure 2-2. Characters Versus GlyphsTable 2-2. User-Perceived Characters with Multiple Code PointsFigure 2-3. Unicode Character Code to Rendered GlyphsSemanticsPlain TextLogical OrderFigure 2-4. Bidirectional OrderingFigure 2-5. Writing Direction and NumbersUnificationFigure 2-6. Typeface Variation for the Bone CharacterDynamic CompositionFigure 2-7. Dynamic CompositionStabilityConvertibility2.3 Compatibility CharactersCompatibility VariantsCompatibility Decomposable Characters2.4 Code Points and CharactersFigure 2-8. Abstract and Encoded CharactersTypes of Code PointsTable 2-3. Types of Code Points2.5 Encoding FormsFigure 2-9. Overlap in Legacy Mixed-Width EncodingsFigure 2-10. Boundaries and InterpretationFigure 2-11. Unicode Encoding FormsUTF-32UTF-16UTF-82.6 Encoding SchemesTable 2-4. The Seven Unicode Encoding SchemesFigure 2-12. Unicode Encoding Schemes2.7 Unicode Strings2.8 Unicode AllocationPlanesAllocation Areas and BlocksAssignment of Code Points2.9 Details of AllocationFigure 2-13. Unicode AllocationPlane 0 (BMP)Figure 2-14. Allocation on the BMPPlane 1 (SMP)Figure 2-15. Allocation on Plane 1Plane 2 (SIP)Plane 3 (TIP)Other Planes2.10 Writing DirectionFigure 2-16. Writing Directions2.11 Combining CharactersFigure 2-17. Combining Enclosing Marks for SymbolsSequence of Base Characters and Combining CharactersFigure 2-18. Sequence of Base Characters and Combining CharactersFigure 2-19. Reordered Indic Vowel SignsFigure 2-20. Properties and Combining Character SequencesMultiple Combining CharactersFigure 2-21. Multiple Combining CharactersTable 2-5. Interaction of Combining CharactersTable 2-6. Nondefault StackingLigated Multiple Base CharactersFigure 2-22. Ligated Multiple Base CharactersExhibiting Nonspacing Marks in Isolation“Characters” and Grapheme Clusters2.12 Equivalent SequencesFigure 2-23. Equivalent SequencesNormalizationFigure 2-24. Canonical OrderingDecompositionsFigure 2-25. Types of DecomposablesNon-decomposition of Certain Diacritics2.13 Special CharactersSpecial Noncharacter Code PointsByte Order Mark (BOM)Layout and Format Control CharactersThe Replacement CharacterControl Codes2.14 Conforming to the Unicode StandardCharacteristics of Conformant ImplementationsUnacceptable BehaviorAcceptable BehaviorSupported Subsets3 Conformance3.1 Versions of the Unicode StandardStabilityVersion NumberingErrata and CorrigendaReferences to the Unicode StandardPrecision in Version CitationReferences to Unicode Character PropertiesReferences to Unicode Algorithms3.2 Conformance RequirementsCode Points Unassigned to Abstract CharactersInterpretationModificationCharacter Encoding FormsCharacter Encoding SchemesBidirectional TextNormalization FormsNormative ReferencesUnicode AlgorithmsDefault Casing AlgorithmsUnicode Standard Annexes3.3 SemanticsDefinitionsCharacter Identity and Semantics3.4 Characters and EncodingTable 3-1. Named Unicode Algorithms3.5 PropertiesTypes of PropertiesProperty ValuesDefault Property ValuesClassification of Properties by Their ValuesProperty StatusTable 3-2. Normative Character PropertiesTable 3-3. Informative Character PropertiesContext DependenceStability of PropertiesSimple and Derived PropertiesProperty AliasesPrivate Use3.6 CombinationCombining Character SequencesGrapheme ClustersApplication of Combining MarksFigure 3-1. Enclosing Marks3.7 DecompositionCompatibility DecompositionCanonical Decomposition3.8 Surrogates3.9 Unicode Encoding FormsTable 3-4. Examples of Unicode Encoding FormsUTF-32UTF-16Table 3-5. UTF-16 Bit DistributionUTF-8Table 3-6. UTF-8 Bit DistributionTable 3-7. Well-Formed UTF-8 Byte SequencesEncoding Form ConversionConstraints on Conversion ProcessesU+FFFD Substitution of Maximal SubpartsTable 3-8. U+FFFD for Non-Shortest Form SequencesTable 3-9. U+FFFD for Ill-Formed Sequences for SurrogatesTable 3-10. U+FFFD for Other Ill-Formed SequencesTable 3-11. U+FFFD for Truncated Sequences3.10 Unicode Encoding SchemesTable 3-12. Summary of UTF-16BE, UTF-16LE, and UTF-16Table 3-13. Summary of UTF-32BE, UTF-32LE, and UTF-323.11 Normalization FormsNormalization StabilityCombining ClassesSpecification of Unicode Normalization FormsStartersTable 3-14. Combining Marks and Starter StatusCanonical Ordering AlgorithmTable 3-15. Reorderable PairsCanonical Composition AlgorithmDefinition of Normalization Forms3.12 Conjoining Jamo BehaviorDefinitionsHangul Syllable DecompositionTable 3-16. Hangul Characters Used in ExamplesHangul Syllable CompositionHangul Syllable Name GenerationSample Code for Hangul Algorithms3.13 Default Case AlgorithmsDefinitionsTable 3-17. Context Specification for CasingDefault Case ConversionDefault Case FoldingDefault Case DetectionTable 3-18. Case Detection ExamplesDefault Caseless Matching4 Character Properties4.1 Unicode Character Database4.2 CaseDefinitions of Case and CasingTable 4-1. Relationship of Casing DefinitionsTable 4-2. Case Function Values for StringsCase MappingTable 4-3. Sources for Case Mapping Information4.3 Combining ClassesFigure 4-1. Positions of Common Combining MarksReordrant, Split, and Subjoined Combining Marks4.4 Directionality4.5 General CategoryTable 4-4. General Category4.6 Numeric ValueIdeographic Numeric ValuesTable 4-5. Primary Numeric IdeographsTable 4-6. Ideographs Used as Accounting Numbers4.7 Bidi Mirrored4.8 NameTable 4-7. Types of Character Name AliasesUnicode Name PropertyTable 4-8. Name Derivation Rule Prefix StringsCode Point LabelsTable 4-9. Construction of Code Point LabelsUse of Character Names in APIs and User Interfaces4.9 Unicode 1.0 Names4.10 Letters, Alphabetic, and Ideographic4.11 Properties for Text Boundaries4.12 Characters with Unusual PropertiesTable 4-10. Unusual Properties5 Implementation Guidelines5.1 Data Structures for Character ConversionIssuesMultistage TablesFigure 5-1. Two-Stage Tables5.2 Programming Languages and Data TypesUnicode Data Types for C5.3 Unknown and Missing Characters5.4 Handling Surrogate Pairs in UTF-165.5 Handling Numbers5.6 NormalizationFigure 5-2. Normalization5.7 Compression5.8 Newline GuidelinesDefinitionsTable 5-1. Hex Values for AcronymsTable 5-2. NLF Platform CorrelationsLine Separator and Paragraph SeparatorRecommendations5.9 Regular Expressions5.10 Language Information in Plain TextRequirements for Language TaggingLanguage Tags and Han Unification5.11 Editing and SelectionConsistent Text ElementsFigure 5-3. Consistent Character Boundaries5.12 Strategies for Handling Nonspacing MarksKeyboard InputFigure 5-4. Dead Keys Versus Handwriting SequenceTruncationFigure 5-5. Truncating Grapheme Clusters5.13 Rendering Nonspacing MarksFigure 5-6. Inside-Out RuleFigure 5-7. Fallback RenderingFigure 5-8. Bidirectional PlacementFigure 5-9. JustificationCanonical EquivalenceTable 5-3. Typing Order Differing from Canonical OrderTable 5-4. Permuting Combining Class WeightsPositioning MethodsFigure 5-10. Positioning with LigaturesFigure 5-11. Positioning with Contextual FormsFigure 5-12. Positioning with Enhanced Kerning5.14 Locating Text Element Boundaries5.15 Identifiers5.16 Sorting and SearchingCulturally Expected Sorting and SearchingLanguage-Insensitive SortingSearchingSublinear SearchingFigure 5-13. Sublinear Searching5.17 Binary OrderUTF-8 in UTF-16 OrderUTF-16 in UTF-8 Order5.18 Case MappingsTitlecasingComplications for Case MappingFigure 5-14. Uppercase Mapping for Turkish IFigure 5-15. Lowercase Mapping for Turkish IFigure 5-16. Casing of German Sharp SReversibilityCaseless MatchingNormalization and CasingTable 5-5. Casing and Normalization in Strings5.19 Mapping Compatibility Variants5.20 Unicode Security5.21 Ignoring Characters in ProcessingCharacters Ignored in Text SegmentationCharacters Ignored in Line BreakingCharacters Ignored in Cursive JoiningCharacters Ignored in IdentifiersCharacters Ignored in Searching and SortingCharacters Ignored for Display5.22 U+FFFD Substitution in Conversion6 Writing Systems and Punctuation6.1 Writing SystemsFigure 6-1. Overriding Inherent VowelsTable 6-1. Typology of Scripts in the Unicode Standard6.2 General PunctuationFigure 6-2. Forms of CJK PunctuationBlocks Devoted to PunctuationFormat Control CharactersSpace CharactersTable 6-2. Unicode Space CharactersDashes and HyphensTable 6-3. Unicode Dash CharactersPaired PunctuationLanguage-Based Usage of Quotation MarksFigure 6-3. European Quotation MarksTable 6-4. Models of Visual Relationship between Quote GlyphsTable 6-5. East Asian Quotation MarksFigure 6-4. Asian Quotation MarksTable 6-6. Opening and Closing FormsApostrophesOther PunctuationTable 6-7. Names for the @Archaic Punctuation and Editorial MarksFigure 6-5. Examples of Ancient Greek Editorial MarksFigure 6-6. Use of Greek ParagraphosIndic PunctuationTable 6-8. Unicode Danda CharactersCJK PunctuationFigure 6-7. CJK ParenthesesUnknown or Unavailable IdeographsCJK Compatibility Forms7 Europe-I7.1 LatinFigure 7-1. Alternative Glyphs in LatinTable 7-1. Preferred Rendering of Cedilla versus Comma BelowFigure 7-2. Diacritics on i and jFigure 7-3. Vietnamese Letters and Tone MarksLetters of Basic Latin: U+0041–U+007ALetters of the Latin-1 Supplement: U+00C0–U+00FFLatin Extended-A: U+0100–U+017FLatin Extended-B: U+0180–U+024FIPA Extensions: U+0250–U+02AFPhonetic Extensions: U+1D00–U+1DBFLatin Extended Additional: U+1E00–U+1EFFLatin Extended-C: U+2C60–U+2C7FLatin Extended-D: U+A720–U+A7FFLatin Extended-E: U+AB30–U+AB6FLatin Extended-F: U+10780–U+107BFLatin Ligatures: U+FB00–U+FB067.2 GreekGreek: U+0370–U+03FFTable 7-2. Nonspacing Marks Used with GreekFigure 7-4. Variations in Greek Capital Letter UpsilonGreek Extended: U+1F00–U+1FFFTable 7-3. Greek Spacing and Nonspacing PairsAncient Greek Numbers: U+10140–U+1018F7.3 CopticFigure 7-5. Coptic Numerals7.4 CyrillicCyrillic: U+0400–U+04FFCyrillic Supplement: U+0500–U+052FCyrillic Extended-A: U+2DE0–U+2DFFFigure 7-6. Combination of Titlo LettersCyrillic Extended-B: U+A640–U+A69FCyrillic Extended-C: U+1C80–U+1C8FCyrillic Extended-D: U+1E030–U+1E08F7.5 GlagoliticGlagolitic: U+2C00–U+2C5FGlagolitic Supplement: U+1E000–U+1E02F7.6 Armenian7.7 GeorgianGeorgian: U+10A0–U+10FFGeorgian Extended: U+1C90–U+1CBFGeorgian Supplement: U+2D00–U+2D2FFigure 7-7. Georgian Scripts and Casing7.8 Modifier LettersSpacing Modifier Letters: U+02B0–U+02FFFigure 7-8. Tone LettersModifier Tone Letters: U+A700–U+A71F7.9 Combining MarksFigure 7-9. Double DiacriticsFigure 7-10. Positioning of Double DiacriticsFigure 7-11. Use of CGJ with Double DiacriticsFigure 7-12. Interaction of Combining Marks with LigaturesCombining Diacritical Marks: U+0300–U+036FCombining Diacritical Marks Extended: U+1AB0–U+1AFFFigure 7-13. Positioning of Combining ParenthesesCombining Diacritical Marks Supplement: U+1DC0–U+1DFFTable 7-4. Typicon Kavyka SymbolsCombining Diacritical Marks for Symbols: U+20D0–U+20FFFigure 7-14. Use of Vertical Line Overlay for NegationCombining Half Marks: U+FE20–U+FE2FFigure 7-15. Double Diacritics and Half MarksCombining Marks in Other Blocks8 Europe-II8.1 Linear A8.2 Linear BLinear B Syllabary: U+10000–U+1007FLinear B Ideograms: U+10080–U+100FFAegean Numbers: U+10100–U+1013F8.3 Cypriot SyllabaryTable 8-1. Similar Characters in Linear B and Cypriot8.4 Cypro-Minoan8.5 Ancient Anatolian AlphabetsLycian: U+10280–U+1029FCarian: U+102A0–U+102DFLydian: U+10920–U+1093F8.6 Old ItalicFigure 8-1. Distribution of Old Italic8.7 Runic8.8 Old Hungarian8.9 Gothic8.10 Elbasan8.11 Caucasian Albanian8.12 Vithkuqi8.13 Old PermicTable 8-2. Combining Marks Used in Old Permic8.14 Ogham8.15 Shavian9 Middle East-I9.1 HebrewHebrew: U+0590–U+05FFAlphabetic Presentation Forms: U+FB1D–U+FB4F9.2 ArabicArabic: U+0600–U+06FFFigure 9-1. Directionality and Cursive ConnectionFigure 9-2. Using a JoinerFigure 9-3. Using a Non-joinerFigure 9-4. Combinations of Joiners and Non-joinersFigure 9-5. Placement of HarakatFigure 9-6. Dammatan StylesTable 9-1. Arabic Digit NamesTable 9-2. Glyph Variation in Eastern Arabic-Indic DigitsFigure 9-7. Arabic Signs Spanning NumbersArabic Cursive JoiningTable 9-3. Primary Arabic Joining TypesTable 9-4. Derived Arabic Joining TypesTable 9-5. Arabic Glyph TypesArabic LigaturesFigure 9-8. Lam-alef with MarksTable 9-6. Arabic Ligature NotationArabic Joining GroupsTable 9-7. Dual-Joining Arabic CharactersTable 9-8. Right-Joining Arabic CharactersTable 9-9. Letter heh ShapesTable 9-10. Forms of the Arabic Letter yehTable 9-11. Glyph Variation for U+0626 Yeh with Hamza AboveCombining HamzaTable 9-12. Arabic Letters With Hamza AboveOther Letters for Extended ArabicArabic Letters in Other LanguagesTable 9-13. Glyph Variation for U+0645 MeemArabic Supplement: U+0750–U+077FArabic Extended-A: U+08A0–U+08FFArabic Presentation Forms-A: U+FB50–U+FDFFArabic Presentation Forms-B: U+FE70–U+FEFF9.3 SyriacSyriac: U+0700–U+074FFigure 9-9. Syriac AbbreviationFigure 9-10. Use of SAMTable 9-14. Miscellaneous Syriac Diacritic UseSyriac ShapingTable 9-15. Syriac Final Alaph Glyph TypesTable 9-16. Dual-Joining Syriac CharactersTable 9-17. Right-Joining Syriac CharactersTable 9-18. Syriac Alaph Glyph FormsTable 9-19. Syriac LigaturesSyriac Supplement: U+0860–U+086F9.4 SamaritanTable 9-20. Samaritan Performative Punctuation Marks9.5 MandaicTable 9-21. Dual-Joining Mandaic CharactersTable 9-22. Right-Joining Mandaic Characters9.6 Yezidi10 Middle East-II10.1 Old North Arabian10.2 Old South ArabianTable 10-1. Old South Arabian Numeric CharactersTable 10-2. Number Formation in Old South Arabian10.3 Phoenician10.4 Imperial AramaicTable 10-3. Number Formation in Aramaic10.5 ManichaeanTable 10-4. Dual-Joining Manichaean LettersTable 10-5. Right-Joining Manichaean LettersTable 10-6. Left-Joining Manichaean LettersTable 10-7. Non-Joining Manichaean LettersTable 10-8. Manichaean Ligatures10.6 Pahlavi and ParthianInscriptional Parthian: U+10B40–U+10B5FInscriptional Pahlavi: U+10B60–U+10B7FTable 10-9. Inscriptional Parthian Shaping BehaviorPsalter Pahlavi: U+10B80–U+10BAF10.7 AvestanTable 10-10. Avestan Shaping Behavior10.8 Chorasmian10.9 Elymaic10.10 Nabataean10.11 Palmyrene10.12 Hatran11 Cuneiform and Hieroglyphs11.1 Sumero-AkkadianCuneiform: U+12000–U+123FFTable 11-1. Cuneiform Script UsageCuneiform Numbers and Punctuation: U+12400–U+1247FEarly Dynastic Cuneiform: U+12480–U+1254F11.2 Ugaritic11.3 Old Persian11.4 Egyptian HieroglyphsEgyptian Hieroglyphs: U+13000–U+1342FEgyptian Hieroglyph Format Controls: U+13430–U+1345FFigure 11-1. Vertical and Horizontal Formatting of HieroglyphsFigure 11-2. Insertion and Overlay Formatting of HieroglyphsFigure 11-3. Use of U+13439 to Insert at MiddleFigure 11-4. Rendering EnclosuresFigure 11-5. Complex Cluster Formatting of HieroglyphsTable 11-2. Complex Hieroglyphs and Nonequivalent SequencesFigure 11-6. Rotation of HieroglyphsEditorial MarksFigure 11-7. Use of BlanksFigure 11-8. Use of Lost SignsFigure 11-9. Damage Modifiers for HieroglyphsTable 11-3. Brackets used with Egyptian HieroglyphsFigure 11-10. Use of Square Brackets with Hieroglyphs11.5 Meroitic11.6 Anatolian Hieroglyphs12 South and Central Asia-I12.1 DevanagariDevanagari: U+0900–U+097FPrinciples of the Devanagari ScriptTable 12-1. Devanagari Vowel LettersFigure 12-1. Dead Consonants in DevanagariTable 12-2. Devanagari Atomic ConsonantsFigure 12-2. Conjunct Formations in DevanagariFigure 12-3. Multi-Consonant Conjuncts in DevanagariTable 12-3. Devanagari Consonant ConjunctsFigure 12-4. Preventing Conjunct Forms in DevanagariFigure 12-5. Half-Consonants in DevanagariFigure 12-6. Independent Half-Forms in DevanagariFigure 12-7. Half-Consonants in OriyaFigure 12-8. Consonant Forms in Devanagari and OriyaRendering DevanagariFigure 12-9. Rendering Order in DevanagariTable 12-4. Sample Devanagari Half-FormsTable 12-5. Sample Devanagari LigaturesTable 12-6. RA + Vocalic Letter Ligature FormsTable 12-7. Sample Devanagari Half-Ligature FormsTable 12-8. Marathi and Nepali AllographsDevanagari Digits, Punctuation, and SymbolsExtensions in the Main Devanagari BlockFigure 12-10. Use of Apostrophe in Bodo, Dogri and MaithiliFigure 12-11. Use of Avagraha in DogriTable 12-9. Devanagari Vowels Used in Bihari LanguagesTable 12-10. Prishthamatra OrthographyDevanagari Extended: U+A8E0–U+A8FFDevanagari Extended-A: U+11B00–U+11B5FVedic Extensions: U+1CD0–U+1CFF12.2 Bengali (Bangla)Table 12-11. Bangla Vowel LettersTable 12-12. Diphthong Vowel Letters in KokborokTable 12-13. Assamese Consonant-Vowel CombinationsTable 12-14. Bangla Consonant-Vowel CombinationsFigure 12-12. Requesting Bangla Consonant-Vowel LigatureFigure 12-13. Blocking Bangla Consonant-Vowel LigatureFigure 12-14. Bangla Syllable ttaTable 12-15. Use of Apostrophe in Bangla12.3 GurmukhiTable 12-16. Gurmukhi Vowel LettersTable 12-17. Gurmukhi ConjunctsTable 12-18. Additional Pairin and Addha Forms in GurmukhiTable 12-19. Use of Joiners in Gurmukhi12.4 GujaratiTable 12-20. Gujarati Vowel LettersTable 12-21. Gujarati Conjuncts12.5 Oriya (Odia)Table 12-22. Oriya Vowel LettersTable 12-23. Oriya ConjunctsTable 12-24. Oriya Vowel PlacementTable 12-25. Ligation for the Syllable om12.6 TamilTamil: U+0B80–U+0BFFFigure 12-15. Kssa Ligature in TamilTamil VowelsTable 12-26. Tamil Vowel LettersFigure 12-16. Tamil Vowel ReorderingFigure 12-17. Tamil Two-Part VowelsFigure 12-18. Tamil Vowel Splitting and ReorderingFigure 12-19. Vowel Reordering Around a Tamil ConjunctFigure 12-20. Confusable Vowels in TamilTamil LigaturesFigure 12-21. Tamil Ligatures with iTable 12-27. Tamil Ligatures with uFigure 12-22. Spacing Forms of Tamil uFigure 12-23. Tamil Ligatures with raFigure 12-24. Tamil Ligatures for shriFigure 12-25. Traditional Tamil Ligatures with aaFigure 12-26. Traditional Tamil Ligatures with oFigure 12-27. Traditional Tamil Ligatures with aiFigure 12-28. Vowel ai in Modern TamilTable 12-28. Confusable Tamil DigitsTamil Supplement: U+11FC0–U+11FFFTamil Named Character SequencesTable 12-29. Tamil Vowels, Consonants, and Syllables12.7 TeluguTable 12-30. Telugu Vowels12.8 KannadaKannada: U+0C80–U+0CFFPrinciples of the Kannada ScriptTable 12-31. Kannada Vowel LettersFigure 12-29. Indicating Retroflexion in Badaga VowelsRendering Kannada12.9 MalayalamMalayalam: U+0D00–U+0D7FTable 12-32. Malayalam Vowel LettersMalayalam Orthographic ReformTable 12-33. Malayalam Orthographic ReformRendering MalayalamTable 12-34. Malayalam ConjunctsTable 12-35. Candrakkala ExamplesTable 12-36. Use of Joiners in MalayalamTable 12-37. Malayalam Conjuncts Involving ChillusTable 12-38. Malayalam /sasa/ and /uua/Table 12-39. Malayalam /ṉṟa/ and /ṉṯa/Table 12-40. Legacy Encoding of Malayalam ChillusMalayalam Numbers and Punctuation13 South and Central Asia-II13.1 ThaanaTable 13-1. Thaana Glyph Placement13.2 SinhalaSinhala: U+0D80–U+0DFFTable 13-2. Sinhala Vowel LettersTable 13-3. Sinhala Named Character SequencesTable 13-4. Sinhala Ligated ConjunctsTable 13-5. Irregular Vowel Sign Ligatures of SinhalaSinhala Archaic Numbers: U+111E0–U+111FF13.3 NewaTable 13-6. Murmured Resonants in Nepal Bhasa13.4 TibetanFigure 13-1. Tibetan Syllable StructureFigure 13-2. Justifying Tibetan Tseks13.5 MongolianMongolian: U+1800–U+18AFTable 13-7. Letter Usage in Mongolian Writing SystemsFigure 13-3. Mongolian Glyph ConvergenceFigure 13-4. Mongolian Consonant LigationFigure 13-5. Mongolian Positional FormsFigure 13-6. Mongolian Free Variation SelectorFigure 13-7. Mongolian Gender FormsFigure 13-8. Mongolian Vowel SeparatorMongolian Supplement: U+11660–U+1167F13.6 LimbuTable 13-8. Positions of Limbu Combining Characters13.7 Meetei MayekMeetei Mayek: U+ABC0–U+ABFFMeetei Mayek Extensions: U+AAE0–U+AAF613.8 Mro13.9 Warang Citi13.10 Ol Chiki13.11 Chakma13.12 LepchaTable 13-9. Lepcha Syllabic Structure13.13 Saurashtra13.14 Masaram GondiFigure 13-9. Masaram Gondi Consonant ClustersFigure 13-10. Rendering of ra in Masaram GondiTable 13-10. Various Signs in Masaram Gondi13.15 Gunjala GondiFigure 13-11. Gunjala Gondi Conjunct Formation13.16 Wancho13.17 Toto13.18 Tangsa14 South and Central Asia-III14.1 BrahmiTable 14-1. Brahmi Vowel LettersFigure 14-1. Consonant Ligatures in BrahmiTable 14-2. Brahmi Positional Digits14.2 KharoshthiKharoshthi: U+10A00–U+10A5FFigure 14-2. Geographical Extent of the Kharoshthi ScriptFigure 14-3. Kharoshthi Number 1996Rendering KharoshthiFigure 14-4. Kharoshthi Rendering ExampleTable 14-3. Kharoshthi Vowel SignsTable 14-4. Kharoshthi Vowel ModifiersTable 14-5. Kharoshthi Consonant ModifiersTable 14-6. Examples of Kharoshthi ViramaFigure 14-5. Subjoined Forms of ya14.3 Bhaiksuki14.4 Phags-paFigure 14-6. Phags-pa Syllable OmTable 14-7. Phags-pa Positional Forms of I, U, E, and OTable 14-8. Contextual Glyph Mirroring in Phags-paTable 14-9. Phags-pa Standardized VariantsFigure 14-7. Phags-pa Reversed Shaping14.5 Marchen14.6 Zanabazar SquareFigure 14-8. Conjunct Stacking in Zanabazar Square14.7 Soyombo14.8 Old Turkic14.9 Old Sogdian14.10 Sogdian14.11 Old Uyghur15 South and Central Asia-IV15.1 Syloti Nagri15.2 Kaithi15.3 Sharada15.4 TakriTable 15-1. Takri Vowel Letters15.5 SiddhamFigure 15-1. Siddham Consonant ClusterTable 15-2. Siddham Punctuation Characters15.6 Mahajani15.7 KhojkiTable 15-3. Khojki Vowels15.8 Nag Mundari15.9 KhudawadiTable 15-4. Khudawadi Vowel LettersTable 15-5. Representation of Arabic Sounds in Khudawadi15.10 Multani15.11 TirhutaTable 15-6. Tirhuta Vowel Letters15.12 ModiTable 15-7. Modi Vowel LettersFigure 15-2. Modi Shaping for ra15.13 NandinagariNandinagari: U+119A0–U+119FF15.14 GranthaGrantha: U+11300–U+1137FRendering GranthaFigure 15-3. Splitting Large Conjunct Stacks in GranthaTable 15-8. Rendering of Explicit Virama Forms in GranthaTable 15-9. Additional Svara Marks used in Grantha15.15 Dives Akuru15.16 Ahom15.17 Sora Sompeng15.18 Dogra16 Southeast Asia-I16.1 ThaiTable 16-1. Glyph Positions in Thai Syllables16.2 LaoTable 16-2. Glyph Positions in Lao SyllablesTable 16-3. Additional Characters for Pali and Sanskrit16.3 MyanmarMyanmar: U+1000–U+109FTable 16-4. Modern Burmese Syllabic StructureMyanmar Extended-A: U+AA60–U+AA7FKhamti ShanTable 16-5. Khamti Shan Tone MarksAiton and PhakeMyanmar Extended-B: U+A9E0–U+A9FF16.4 KhmerKhmer: U+1780–U+17FFPrinciples of the Khmer ScriptTable 16-6. Independent Khmer Vowel CharactersTable 16-7. Two Registers of Khmer ConsonantsTable 16-8. Khmer Subscript Consonant SignsTable 16-9. Khmer Composite Dependent Vowel Signs with NikahitTable 16-10. Khmer Subscript Independent Vowel SignsFigure 16-1. Common Ligatures in KhmerFigure 16-2. Common Multiple Forms in KhmerFigure 16-3. Examples of Syllabic Order in KhmerFigure 16-4. Ligation in Muul Style in KhmerKhmer Symbols: U+19E0–U+19FF16.5 Tai LeTable 16-11. Tai Le Tone MarksTable 16-12. Myanmar Digits in Tai Le16.6 New Tai LueTable 16-13. New Tai Lue Vowel PlacementTable 16-14. New Tai Lue Registers and Tones16.7 Tai Tham16.8 Tai VietTable 16-15. Tai Viet Symbols and Punctuation16.9 Kayah Li16.10 ChamTable 16-16. Cham Syllabic Structure16.11 Pahawh HmongFigure 16-5. Pahawh Hmong Syllable Structure16.12 Nyiakeng Puachue Hmong16.13 Pau Cin Hau16.14 Hanifi Rohingya17 Southeast Asia-II17.1 Philippine ScriptsTagalog: U+1700–U+171FHanunóo: U+1720–U+173FBuhid: U+1740–U+175FTagbanwa: U+1760–U+177FPrinciples of the Philippine ScriptsTable 17-1. Hanunóo and Buhid Vowel Sign Combinations17.2 BugineseFigure 17-1. Buginese Ligature17.3 BalineseTable 17-2. Balinese Base Consonants and Conjunct FormsTable 17-3. Sasak Extensions for BalineseFigure 17-2. Writing dharma in BalineseTable 17-4. Balinese Consonant Clusters with u and u:17.4 JavaneseFigure 17-3. Representation of Javanese Two-Part Vowels17.5 Rejang17.6 Batak17.7 SundaneseSundanese: U+1B80–U+1BBFTable 17-5. Modern Sundanese Syllabic StructureSundanese Supplement: U+1CC0–U+1CCF17.8 Makasar17.9 KawiTable 17-6. Kawi Base Consonants and Conjunct FormsTable 17-7. Kawi Independent Vowels with Composite RepresentationsTable 17-8. Kawi Vocalic Liquids with Conjunct FormsTable 17-9. Kawi Dependent Vowels with Composite Representations18 East Asia18.1 HanCJK Unified IdeographsBlocks Containing Han IdeographsTable 18-1. Blocks Containing Han IdeographsTable 18-2. Small Extensions to CJK BlocksGeneral Characteristics of Han IdeographsTable 18-3. Common Han CharactersFigure 18-1. Han SpellingFigure 18-2. Semantic Context for Han CharactersPrinciples of Han UnificationFigure 18-3. Three-Dimensional Conceptual ModelUnification RulesFigure 18-4. CJK Source SeparationTable 18-4. Source Encoding for Sword VariantsFigure 18-5. Not Cognates, Not UnifiedAbstract ShapeFigure 18-6. Ideographic Component StructureFigure 18-7. The Most Superior Node of an Ideographic ComponentTable 18-5. Ideographs Not UnifiedTable 18-6. Ideographs UnifiedHan Ideograph ArrangementTable 18-7. Han Ideograph ArrangementRadical-Stroke IndicesMappings for Han IdeographsCJK Compatibility Ideographs: U+F900–U+FAFFCJK Compatibility Supplement: U+2F800–U+2FA1DKanbun: U+3190–U+319FSymbols Derived from Han IdeographsCJK and Kangxi Radicals: U+2E80–U+2FD5CJK Additions from HKSCS and GB 18030CJK Strokes: U+31C0–U+31EFIdeographic Symbols and Punctuation: U+16FE0–U+16FFF18.2 Ideographic Description CharactersFigure 18-8. Examples of Ideographic Description CharactersFigure 18-9. Using the Ideographic Description Characters18.3 BopomofoTable 18-8. Mandarin Tone MarksTable 18-9. Minnan and Hakka Tone Marks18.4 Hiragana and KatakanaHiragana: U+3040–U+309FKatakana: U+30A0–U+30FFKatakana Phonetic Extensions: U+31F0–U+31FFSmall Kana Extension: U+1B130-U+1B16FKana Supplement: U+1B000–U+1B0FFKana Extended-A: U+1B100–U+1B12FFigure 18-10. Japanese Historic Kana for e and yeFigure 18-11. Hentaigana Distinct Parent IdeographsFigure 18-12. Other Hentaigana ExamplesKana Extended-B: U+1AFF0-U+1AFFFFigure 18-13. Vertical Layout with InterlineationFigure 18-14. Vertical Layout without InterlineationFigure 18-15. Horizontal Layout with InterlineationFigure 18-16. Horizontal Layout without Interlineation18.5 Halfwidth and Fullwidth Forms18.6 HangulHangul Jamo: U+1100–U+11FFHangul Jamo Extended-A: U+A960–U+A97FHangul Jamo Extended-B: U+D7B0–U+D7FFHangul Compatibility Jamo: U+3130–U+318FTable 18-10. Separating Jamo CharactersHangul Syllables: U+AC00–U+D7A3Table 18-11. Line-Based Placement of Jungseong18.7 Yi18.8 Nüshu18.9 LisuTable 18-12. Lisu Tone LettersTable 18-13. Punctuation Adopted in Lisu Orthography18.10 Miao18.11 TangutTangut: U+17000–U+187FFTangut Supplement: U+18D00–U+18D8FTangut Components: U+18800–U+18AFF18.12 Khitan Small ScriptFigure 18-17. Cluster Patterns in Khitan Small Script19 Africa19.1 EthiopicEthiopic: U+1200–U+137FTable 19-1. Labialized Forms in Ethiopic -WAATable 19-2. Labialized Forms in Ethiopic -WEEthiopic Extensions19.2 Osmanya19.3 TifinaghFigure 19-1. Tifinagh Contextual ShapingFigure 19-2. Tifinagh Consonant Joiner and Bi-consonants19.4 N’KoTable 19-3. N’Ko Diacritic UsageTable 19-4. N’Ko Tone Diacritics on VowelsFigure 19-3. Examples of N’Ko OrdinalsTable 19-5. N’Ko Letter Shaping19.5 Vai19.6 BamumBamum: U+A6A0–U+A6FFBamum Supplement: U+16800–U+16A3F19.7 Bassa Vah19.8 Mende KikakuiTable 19-6. Number Formation in Mende Kikakui19.9 Adlam19.10 Medefaidrin20 Americas20.1 Cherokee20.2 Canadian Aboriginal SyllabicsUnified Canadian Aboriginal Syllabics: U+1400–U+167FFigure 20-1. Position of Carrier Syllable FinalsUnified Canadian Aboriginal Syllabics Extended: U+18B0–U+18FFUnified Canadian Aboriginal Syllabics Extended-A: U+11AB0–U+11ABF20.3 OsageTable 20-1. Combining Marks used in Osage20.4 DeseretFigure 20-2. Short Words Equivalent to Deseret Letter NamesTable 20-2. IPA Transcription of Deseret21 Notational Systems21.1 Braille21.2 Western Musical SymbolsFigure 21-1. Examples of Specialized Music LayoutFigure 21-2. Precomposed Note CharactersFigure 21-3. Alternative NoteheadsFigure 21-4. Augmentation Dots and Articulation SymbolsTable 21-1. Examples of Ornamentation21.3 Byzantine Musical Symbols21.4 Znamenny Musical Notation21.5 Ancient Greek Musical NotationTable 21-2. Representation of Ancient Greek Vocal and Instrumental Notation21.6 DuployanDuployan: U+1BC00–U+1BC9FShorthand Format Controls: U+1BCA0–U+1BCAF21.7 Sutton SignWritingSutton SignWriting: U+1D800–U+1DAAF22 Symbols22.1 Currency SymbolsFigure 22-1. Alternative Glyphs for Dollar SignCurrency Symbols: U+20A0–U+20CFTable 22-1. Currency Symbols Encoded in Other Blocks22.2 Letterlike SymbolsLetterlike Symbols: U+2100–U+214FFigure 22-2. Alternative Glyphs for Numero SignMathematical Alphanumeric Symbols: U+1D400–U+1D7FFMathematical AlphabetsFigure 22-3. Wide Mathematical AccentsFigure 22-4. Style Variants and Semantic Distinctions in MathematicsTable 22-2. Mathematical Alphanumeric SymbolsFonts Used for Mathematical AlphabetsFigure 22-5. Easily Confused Shapes for Mathematical GlyphsArabic Mathematical Alphabetic Symbols: U+1EE00–U+1EEFF22.3 NumeralsDecimal DigitsTable 22-3. Script-Specific Decimal DigitsFigure 22-6. CJK Ideographic NumbersOther DigitsTable 22-4. Compatibility DigitsFigure 22-7. Regular and Old Style DigitsNon-Decimal Radix SystemsAcrophonic Systems and Other Letter-based NumbersCoptic Epact Numbers: U+102E0–U+102FFRumi Numeral Symbols: U+10E60–U+10E7ESiyaq Numerical Notation SystemsCJK NumeralsFractionsFigure 22-8. Alternate Forms of Vulgar FractionsCommon Indic Number Forms: U+A830–U+A83F22.4 Superscript and Subscript SymbolsSuperscripts and Subscripts: U+2070–U+209F22.5 Mathematical SymbolsMathematical Operators: U+2200–U+22FFTable 22-5. Mathematical Operators Disunified from PunctuationSupplements to Mathematical Symbols and ArrowsSupplemental Mathematical Operators: U+2A00–U+2AFFMiscellaneous Mathematical Symbols-A: U+27C0–U+27EFMiscellaneous Mathematical Symbols-B: U+2980–U+29FFMiscellaneous Symbols and Arrows: U+2B00–U+2B7FArrows: U+2190–U+21FFSupplemental ArrowsStandardized Variants of Mathematical Symbols22.6 Invisible Mathematical Operators22.7 Technical SymbolsControl Pictures: U+2400–U+243FMiscellaneous Technical: U+2300–U+23FFFigure 22-9. Usage of Crops and Quine CornersTable 22-6. Use of Mathematical Symbol PiecesFigure 22-10. Usage of the Decimal Exponent SymbolOptical Character Recognition: U+2440–U+245FSymbols for Legacy Computing: U+1FB00-U+1FBFF22.8 Geometrical SymbolsBox Drawing and Block ElementsGeometric Shapes: U+25A0–U+25FFGeometric Shapes Extended: U+1F780–U+1F7FFTable 22-7. Geometric Shape Collections22.9 Miscellaneous SymbolsTable 22-8. Blocks with Characters Often Shown as EmojiMiscellaneous Symbols and PictographsEmoticons: U+1F600–U+1F64FTransport and Map Symbols: U+1F680–U+1F6FFDingbats: U+2700–U+27BFOrnamental Dingbats: U+1F650–U+1F67FAlchemical Symbols: U+1F700–U+1F77FMahjong Tiles: U+1F000–U+1F02FDomino Tiles: U+1F030–U+1F09FPlaying Cards: U+1F0A0–U+1F0FFChess Symbols: U+1FA00–U+1FA6FYijing Hexagram Symbols: U+4DC0–U+4DFFTai Xuan Jing Symbols: U+1D300–U+1D356Ancient Symbols: U+10190–U+101CFPhaistos Disc Symbols: U+101D0–U+101FF22.10 Enclosed and SquareEnclosed Alphanumerics: U+2460–U+24FFEnclosed CJK Letters and Months: U+3200–U+32FFCJK Compatibility: U+3300–U+33FFTable 22-9. Japanese Era NamesEnclosed Alphanumeric Supplement: U+1F100–U+1F1FFEnclosed Ideographic Supplement: U+1F200–U+1F2FF23 Special Areas and Format Characters23.1 Control CodesRepresenting Control SequencesSpecification of Control Code SemanticsTable 23-1. Control Codes Specified in the Unicode Standard23.2 Layout ControlsLine and Word BreakingTable 23-2. Letter SpacingCursive Connection and LigaturesFigure 23-1. Prevention of JoiningFigure 23-2. Exhibition of Joining Glyphs in IsolationFigure 23-3. Effect of Intervening JoinersPrepended Concatenation MarksCombining Grapheme JoinerBidirectional Ordering ControlsTable 23-3. Bidirectional Ordering ControlsStateful Format ControlsTable 23-4. Paired Stateful ControlsTable 23-5. Paired Stateful Controls (Deprecated)23.3 Deprecated Format Characters23.4 Variation Selectors23.5 Private-Use CharactersPrivate Use Area: U+E000–U+F8FFSupplementary Private Use Areas23.6 Surrogates Area23.7 Noncharacters23.8 SpecialsByte Order Mark (BOM): U+FEFFTable 23-6. Unicode Encoding Scheme SignaturesTable 23-7. U+FEFF Signature in Other CharsetsSpecials: U+FFF0–U+FFF8Annotation Characters: U+FFF9–U+FFFBFigure 23-4. Annotation CharactersReplacement Characters: U+FFFC–U+FFFD23.9 Tag CharactersTag Characters: U+E0000–U+E007FDeprecated Use for Language Tagging24 About the Code Charts24.1 Character Names ListImages in the Code Charts and Character ListsSpecial Characters and Code PointsCharacter NamesInformative AliasesNormative AliasesCross ReferencesInformation About LanguagesCase MappingsDecompositionsStandardized Variation SequencesPositional FormsBlock HeadersSubheads24.2 CJK and Other IdeographsCJK Unified IdeographsTable 24-1. IRG SourcesFigure 24-1. CJK Chart Format for the Main CJK BlockFigure 24-2. CJK Chart Format for CJK Extension ACompatibility IdeographsFigure 24-3. CJK Chart Format for Compatibility IdeographsFigure 24-4. Annotations Identifying CJK Unified IdeographsTangut Ideographs24.3 Hangul SyllablesA Notational ConventionsA.1 Typographic ConventionsCode PointsCharacter NamesCharacter BlocksSequencesProperties and Property ValuesMiscellaneousOperatorsTable A-1. OperatorsA.2 Extended BNFTable A-2. Extended BNFCharacter ClassesTable A-3. Character Class ExamplesA.3 RenderingFigure A-1. Example of RenderingB Unicode Publications and ResourcesB.1 The Unicode ConsortiumThe Unicode Technical CommitteeOther ActivitiesB.2 Unicode PublicationsB.3 Other Unicode Online ResourcesUnicode Online ResourcesHow to Contact the Unicode ConsortiumC Relationship to ISO/IEC 10646C.1 HistoryTable C-1. TimelineC.2 Encoding Forms in ISO/IEC 10646Zero ExtendingTable C-2. Zero ExtendingC.3 UTF-8 and UTF-16UTF-8UTF-16C.4 Synchronization of the StandardsC.5 Identification of Features for UnicodeC.6 Character NamesC.7 Character Functional SpecificationsD Version History of the StandardTable D-1. Versions of Unicode and ISO/IEC 10646E Han Unification HistoryE.1 Development of the UROE.2 Continuing Research on IdeographsIdeographic Rapporteur GroupIdeographic Research GroupE.3 CJK SourcesF Documentation of CJK StrokesTable F-1. CJK StrokesI Index
This page contains links to sections, tables, and figures of the core specification forThe Unicode Standard, Version 15.0. SeeUnicode 15.0.0 for full context about the Unicode Standard.