Unicode 8.0 Web BookmarksAbout this pageThis page contains hyperlinks toThe Unicode Standard, Version 8.0. TheUnicode 8.0.0 page lists the contents with links to each PDF file.PrefaceWhy Unicode?What’s New?Support for Languages and Symbol SetsConformance UpdatesProperty and Behavioral UpdatesDetailed Change InformationOrganization of This StandardConcepts, Architecture, Conformance, and GuidelinesCharacter Block DescriptionsCode ChartsAppendicesReferences and IndexGlossary and Character IndexUnicode Standard AnnexesThe Unicode Character DatabaseUnicode Code ChartsUnicode Technical Standards and Unicode Technical ReportsUpdates and ErrataAcknowledgements1 IntroductionFigure 1-1. Wide ASCII1.1 CoverageStandards CoverageNew Characters1.2 Design GoalsFigure 1-2. Unicode Compared to the 2022 Framework1.3 Text HandlingCharacters and GlyphsText Elements2 General Structure2.1 Architectural ContextBasic Text ProcessesText Elements, Characters, and Text ProcessesFigure 2-1. Text Elements and CharactersText Processes and EncodingCharacter Identity2.2 Unicode Design PrinciplesTable 2-1. The 10 Unicode Design PrinciplesUniversalityEfficiencyCharacters, Not GlyphsFigure 2-2. Characters Versus GlyphsTable 2-2. User-Perceived Characters with Multiple Code PointsFigure 2-3. Unicode Character Code to Rendered GlyphsSemanticsPlain TextLogical OrderFigure 2-4. Bidirectional OrderingFigure 2-5. Writing Direction and NumbersUnificationFigure 2-6. Typeface Variation for the Bone CharacterDynamic CompositionFigure 2-7. Dynamic CompositionEquivalent SequencesStabilityConvertibility2.3 Compatibility CharactersUsageAllocationCompatibility VariantsCompatibility Decomposable CharactersCompatibility Character Versus Compatibility Decomposable Character2.4 Code Points and CharactersFigure 2-8. Abstract and Encoded CharactersTypes of Code PointsTable 2-3. Types of Code PointsControl CodesNoncharactersPrivate UseSurrogatesRestricted InterchangeCode Point Semantics2.5 Encoding FormsNon-overlapFigure 2-9. Overlap in Legacy Mixed-Width EncodingsFigure 2-10. Boundaries and InterpretationConformanceExamplesFigure 2-11. Unicode Encoding FormsUTF-32Fixed WidthPreferred UsageUTF-16Optimized for BMPSupplementary Characters and SurrogatesPreferred UsageOriginCollationUTF-8Byte-OrientedVariable WidthASCII TransparencyPreferred UsageSelf-synchronizingComparison of the Advantages of UTF-32, UTF-16, and UTF-8UTF-32 Versus UTF-16Characters Versus Code PointsUTF-8Binary Sorting2.6 Encoding SchemesByte OrderTable 2-4. The Seven Unicode Encoding SchemesEncoding Scheme Versus Encoding FormExamplesFigure 2-12. Unicode Encoding Schemes2.7 Unicode Strings2.8 Unicode AllocationPlanesBasic Multilingual PlaneSupplementary Multilingual PlaneSupplementary Ideographic PlaneSupplementary Special-purpose PlanePrivate Use PlanesAllocation Areas and Character BlocksAllocation AreasBlocksAllocation OrderAssignment of Code Points2.9 Details of AllocationFigure 2-13. Unicode AllocationPlane 0 (BMP)Figure 2-14. Allocation on the BMPASCII and Latin-1 Compatibility AreaGeneral Scripts AreaPunctuation and Symbols AreaSupplementary General Scripts AreaCJK Miscellaneous AreaCJKV Ideographs AreaGeneral Scripts Area (Asia and Africa)Hangul AreaSurrogates AreaPrivate Use AreaCompatibility and Specials AreaPlane 1 (SMP)Figure 2-15. Allocation on Plane 1General Scripts AreasGeneral Scripts Areas (RTL)Cuneiform and Hieroglyphic AreaIdeographic Scripts AreaSymbols AreasPlane 2 (SIP)Other Planes2.10 Writing DirectionFigure 2-16. Writing DirectionsBidirectionalVerticalBoustrophedonOther Historical Directionalities2.11 Combining CharactersCombining CharactersDiacriticsSymbol DiacriticsEnclosing Combining MarksFigure 2-17. Combining Enclosing Marks for SymbolsScript-Specific Combining CharactersSequence of Base Characters and DiacriticsFigure 2-18. Sequence of Base Characters and DiacriticsOrderingIndic Vowel SignsFigure 2-19. Reordered Indic Vowel SignsPropertiesFigure 2-20. Properties and Combining Character SequencesMultiple Combining CharactersFigure 2-21. Stacking SequencesTable 2-5. Interaction of Combining CharactersTable 2-6. Nondefault StackingLigated Multiple Base CharactersFigure 2-22. Ligated Multiple Base CharactersExhibiting Nonspacing Marks in Isolation“Characters” and Grapheme Clusters2.12 Equivalent SequencesFigure 2-23. Equivalent SequencesNormalizationFigure 2-24. Canonical OrderingDecompositionsTypes of DecomposablesExamplesFigure 2-25. Types of DecomposablesNon-decomposition of Certain DiacriticsOverlaid and Attached DiacriticsOther DiacriticsCharacter Names and DecompositionSimulated Decomposition in ProcessingSecurity Issue2.13 Special CharactersSpecial Noncharacter Code PointsByte Order Mark (BOM)Unicode SignatureLayout and Format Control CharactersThe Replacement CharacterControl Codes2.14 Conforming to the Unicode StandardCharacteristics of Conformant ImplementationsUnacceptable BehaviorAcceptable BehaviorSupported Subsets3 Conformance3.1 Versions of the Unicode StandardStabilityVersion NumberingMajor and Minor VersionsUpdate VersionScheduling of VersionsErrata and CorrigendaErrataCorrigendaReferences to the Unicode StandardPrecision in Version CitationReferences to Unicode Character PropertiesReferences to Unicode Algorithms3.2 Conformance RequirementsCode Points Unassigned to Abstract CharactersInterpretationModificationCharacter Encoding FormsCharacter Encoding SchemesBidirectional TextNormalization FormsNormative ReferencesUnicode AlgorithmsDefault Casing AlgorithmsUnicode Standard Annexes3.3 SemanticsDefinitionsCharacter Identity and Semantics3.4 Characters and EncodingTable 3-1. Named Unicode Algorithms3.5 PropertiesTypes of PropertiesProperty ValuesDefault Property ValuesClassification of Properties by Their ValuesProperty StatusTable 3-2. Normative Character PropertiesTable 3-3. Informative Character PropertiesContext DependenceStability of PropertiesSimple and Derived PropertiesProperty AliasesPrivate Use3.6 CombinationCombining Character SequencesGrapheme ClustersApplication of Combining MarksFigure 3-1. Enclosing MarksCombining Marks and Korean Syllables3.7 DecompositionCompatibility DecompositionCanonical Decomposition3.8 Surrogates3.9 Unicode Encoding FormsTable 3-4. Examples of Unicode Encoding FormsUTF-32UTF-16Table 3-5. UTF-16 Bit DistributionUTF-8Table 3-6. UTF-8 Bit DistributionTable 3-7. Well-Formed UTF-8 Byte SequencesEncoding Form ConversionConstraints on Conversion ProcessesBest Practices for Using U+FFFDTable 3-8. Use of U+FFFD in UTF-8 Conversion3.10 Unicode Encoding SchemesTable 3-9. Summary of UTF-16BE, UTF-16LE, and UTF-16Table 3-10. Summary of UTF-32BE, UTF-32LE, and UTF-323.11 Normalization FormsNormalization StabilityCombining ClassesSpecification of Unicode Normalization FormsStartersTable 3-11. Combining Marks and Starter StatusCanonical Ordering AlgorithmTable 3-12. Reorderable PairsCanonical Composition AlgorithmDefinition of Normalization Forms3.12 Conjoining Jamo BehaviorDefinitionsHangul Syllable DecompositionTable 3-13. Hangul Characters Used in ExamplesCommon ConstantsSyllable IndexArithmetic Decomposition MappingFull Canonical DecompositionExampleHangul Syllable CompositionArithmetic Primary Composite MappingExampleHangul Syllable Name GenerationFull Canonical DecompositionJamo Short Name MappingName ConcatenationExampleSample Code for Hangul AlgorithmsCommon ConstantsHangul DecompositionHangul CompositionHangul Character Name GenerationAdditional Transformations for Hangul Jamo3.13 Default Case AlgorithmsTailoringDefinitionsTable 3-14. Context Specification for CasingDefault Case ConversionDefault Case FoldingDefault Case DetectionTable 3-15. Case Detection ExamplesDefault Caseless Matching4 Character PropertiesStatus and AttributesConsistency of Properties4.1 Unicode Character DatabaseUnihan DatabaseStabilityAliasesUCD in XMLOnline Availability4.2 CaseDefinitions of Case and CasingTable 4-1. Relationship of Casing DefinitionsTable 4-2. Case Function Values for StringsCase MappingTable 4-3. Sources for Case Mapping Information4.3 Combining ClassesFigure 4-1. Positions of Common Combining MarksReordrant, Split, and Subjoined Combining MarksReordrant Class Zero Combining MarksTable 4-4. Class Zero Combining Marks—ReordrantTable 4-5. Thai, Lao, and Other Logical Order ExceptionsSplit Class Zero Combining MarksTable 4-6. Class Zero Combining Marks—SplitSubjoined Class Zero Combining MarksTable 4-7. Class Zero Combining Marks—SubjoinedStrikethrough Class Zero Combining MarksTable 4-8. Class Zero Combining Marks—Strikethrough4.4 Directionality4.5 General CategoryTable 4-9. General Category4.6 Numeric ValueDecimal DigitsScript-Specific DigitsIdeographic Numeric ValuesTable 4-10. Primary Numeric IdeographsTable 4-11. Ideographs Used as Accounting Numbers4.7 Bidi MirroredRelated Properties4.8 NameStabilityCharacter Name SyntaxNames as IdentifiersCharacter Name MatchingNamed Character SequencesCharacter Name AliasesTable 4-12. Types of Character Name AliasesUnicode Name PropertyFormal Definition of the Name PropertyName UniquenessInterpretation of Field 1 of UnicodeData.txtControl CodesCode Point LabelsTable 4-13. Construction of Code Point LabelsUse of Character Names in APIs and User InterfacesUse in APIsUser Interfaces4.9 Unicode 1.0 Names4.10 Letters, Alphabetic, and IdeographicLetters and SyllablesAlphabeticIdeographic4.11 Properties Related to Text Boundaries4.12 Characters with Unusual PropertiesTable 4-14. Unusual Properties5 Implementation Guidelines5.1 Data Structures for Character ConversionIssuesMultistage TablesFlat Tables.RangesTwo-Stage TablesFigure 5-1. Two-Stage TablesOptimized Two-Stage TableMultistage Table Tuning5.2 Programming Languages and Data TypesUnicode Data Types for CANSI/ISO C wchar_t5.3 Unknown and Missing CharactersReserved and Private-Use Character CodesInterpretable but Unrenderable CharactersDefault Ignorable Code PointsInteracting with Downlevel Systems5.4 Handling Surrogate Pairs in UTF-16Strategies for Surrogate Pair Support5.5 Handling Numbers5.6 NormalizationAlternative SpellingsNormalizationFigure 5-2. Normalization5.7 Compression5.8 Newline GuidelinesDefinitionsTable 5-1. Hex Values for AcronymsEncodingNotationEBCDICNewline FunctionTable 5-2. NLF Platform CorrelationsLine Separator and Paragraph SeparatorRecommendationsConverting from Other Character Code SetsInterpreting Characters in TextConverting to Other Character Code SetsInput and OutputPage Separator5.9 Regular Expressions5.10 Language Information in Plain TextRequirements for Language TaggingLanguage Tags and Han UnificationTypical Scenarios5.11 Editing and SelectionConsistent Text ElementsCluster BoundariesFigure 5-3. Consistent Character BoundariesStacked BoundariesAtomic Character Boundaries.Linear BoundariesNonlinear Boundaries5.12 Strategies for Handling Nonspacing MarksRenderingOther ProcessesKeyboard InputFigure 5-4. Dead Keys Versus Handwriting SequenceTruncationFigure 5-5. Truncating Grapheme Clusters5.13 Rendering Nonspacing MarksFigure 5-6. Inside-Out RuleFallback RenderingFigure 5-7. Fallback RenderingBidirectional PositioningFigure 5-8. Bidirectional PlacementJustificationFigure 5-9. JustificationCanonical EquivalenceTable 5-3. Typing Order Differing from Canonical OrderTable 5-4. Permuting Combining Class WeightsPositioning MethodsPositioning with LigaturesFigure 5-10. Positioning with LigaturesPositioning with Contextual FormsFigure 5-11. Positioning with Contextual FormsPositioning with Enhanced KerningFigure 5-12. Positioning with Enhanced Kerning5.14 Locating Text Element Boundaries5.15 Identifiers5.16 Sorting and SearchingCulturally Expected Sorting and SearchingLanguage-Insensitive SortingSearchingSublinear SearchingFigure 5-13. Sublinear Searching5.17 Binary OrderUTF-8 in UTF-16 OrderUTF-16 in UTF-8 Order5.18 Case MappingsTitlecasingComplications for Case MappingChange in LengthGreek iota subscriptContext-dependent Case MappingsLocale-dependent Case MappingsFigure 5-14. Uppercase Mapping for Turkish IFigure 5-15. Lowercase Mapping for Turkish ICaseless CharactersGerman sharp sFigure 5-16. Casing of German Sharp SReversibilityCaseless MatchingStabilityNormalization and CasingTable 5-5. Casing and Normalization in Strings5.19 Mapping Compatibility VariantsConfusables5.20 Unicode SecurityAlternate EncodingsSpoofing5.21 Ignoring Characters in ProcessingCharacters Ignored in Text SegmentationCharacters Ignored in Line BreakingCharacters Ignored in Cursive JoiningCharacters Ignored in IdentifiersCharacters Ignored in Searching and SortingCharacters Ignored for DisplayNormal RenderingFallback RenderingDefault Ignorable Code Point5.22 Best Practice for U+FFFD Substitution6 Writing Systems and PunctuationScripts and BlocksScripts and Writing SystemsPunctuation6.1 Writing SystemsAlphabetsAbjadsSyllabariesAbugidasFigure 6-1. Overriding Inherent VowelsLogosyllabariesTypology of Scripts in the Unicode StandardTable 6-1. Typology of Scripts in the Unicode StandardNotational Systems6.2 General PunctuationUse and InterpretationRenderingWriting DirectionFigure 6-2. Forms of CJK PunctuationLayout ControlsEncoding Characters with Multiple Semantic ValuesBlocks Devoted to PunctuationFormat Control CharactersSpace CharactersTable 6-2. Unicode Space CharactersNo-Break SpaceNarrow No-Break SpaceDashes and HyphensTable 6-3. Unicode Dash CharactersSoft HyphenTilde.Dictionary Abbreviation SymbolsPaired PunctuationMirroring of Paired Punctuation.Quotation Marks and BracketsLanguage-Based Usage of Quotation MarksEuropean UsageFigure 6-3. European Quotation MarksGlyph Variation in Curly QuotesTable 6-4. Models of Visual Relationship between Quote GlyphsEast Asian UsageTable 6-5. East Asian Quotation MarksGlyph Variation in East Asian Usage.Figure 6-4. Asian Quotation MarksTable 6-6. Opening and Closing FormsOverloaded Character CodesConsequences for SemanticsApostrophesLetter ApostrophePunctuation ApostropheOther PunctuationHyphenation PointWord Separator Middle DotFraction SlashSpacing Overscores and UnderscoresDoubled PunctuationPeriod or Full StopEllipsisVertical EllipsisLeader DotsOther Basic Latin Punctuation MarksCanonical Equivalence Issues for Greek PunctuationBulletsParagraph MarksNumeric Separators.ObelusCommercial MinusAt SignTable 6-7. Names for the @Archaic Punctuation and Editorial MarksArchaic PunctuationEditorial MarksNew Testament Editorial MarksAncient Greek Editorial MarksFigure 6-5. Examples of Ancient Greek Editorial MarksFigure 6-6. Use of Greek ParagraphosDouble Oblique HyphenIndic PunctuationDandasTable 6-8. Unicode Danda CharactersCJK PunctuationFigure 6-7. CJK ParenthesesWave DashSesame DotsUnknown or Unavailable IdeographsCJK Compatibility FormsVertical FormsStyled Overscores and UnderscoresSmall Form VariantsFullwidth and Halfwidth Variants7 Europe-I7.1 LatinLanguagesDiacritical Marks.Alternative Glyphs.Figure 7-1. Alternative Glyphs in LatinVariations in Diacritical MarksTable 7-1. Preferred Rendering of Cedilla versus Comma BelowLatvian CedillaCedilla and Comma Below in Turkish and RomanianExceptional Case PairsDiacritics on i and jFigure 7-2. Diacritics on i and jVietnameseFigure 7-3. Vietnamese Letters and Tone MarksStandards.Related CharactersLetters of Basic Latin: U+0041–U+007ALetters of the Latin-1 Supplement: U+00C0–U+00FFLanguagesOrdinalsLatin Extended-A: U+0100–U+017FCompatibility DigraphsLanguagesLatin Extended-B: U+0180–U+024FArrangementCroatian Digraphs Matching Serbian Cyrillic LettersPinyin Diacritic–Vowel CombinationsCase PairsCaseless LettersGlottal StopIPA Extensions: U+0250–U+02AFStandardsUnificationsIPA AlternatesCase PairsTypographic VariantsAffricate Digraph LigaturesArrangementPhonetic Extensions: U+1D00–U+1DBFTypographic Features of the UPA.Other Phonetic ExtensionsDigraph for thLatin Extended Additional: U+1E00–U+1EFFCapital Sharp SVietnamese Vowel Plus Tone Mark CombinationsLatin Extended-C: U+2C60–U+2C7FUighurClaudian LettersLatin Extended-D: U+A720–U+A7FFEgyptological TransliterationHistoric Mayan LettersEuropean Medievalist LettersInsular and Celticist LettersOrthographic Letter AdditionsSinological DotLatvian LettersAncient Roman Epigraphic LettersLatin Extended-E: U+AB30–U+AB6FLatin Ligatures: U+FB00–U+FB067.2 GreekGreek: U+0370–U+03FFStandardsPolytonic GreekNonspacing MarksTable 7-2. Nonspacing Marks Used with GreekIotaVariant LetterformsFigure 7-4. Variations in Greek Capital Letter UpsilonRepresentative Glyphs for Greek PhiGreek Letters as SymbolsSymbols Versus NumbersCompatibility PunctuationHistoric LettersCoptic-Unique LettersRelated CharactersGreek Extended: U+1F00–U+1FFFSpacing DiacriticsTable 7-3. Greek Spacing and Nonspacing PairsAncient Greek Numbers: U+10140–U+1018FAcrophonic NumeralsOther Numerical SymbolsSymbol for Zero7.3 CopticDevelopment of the Coptic ScriptCasingFont StylesCharacters for Cryptogrammic UseCrossed SheiSupralineationCombining Diacritical MarksPunctuationNumerical Use of LettersFigure 7-5. Coptic Numerals7.4 CyrillicStructureHistoric LetterformsGlagoliticCyrillic: U+0400–U+04FFStandardsExtended CyrillicAbkhasianPalochkaCyrillic Supplement: U+0500–U+052FKomiKurdish LettersCyrillic Extended-A: U+2DE0–U+2DFFTitlo LettersFigure 7-6. Combination of Titlo LettersCyrillic Extended-B: U+A640–U+A69FNumeric Enclosing SignsTitlo LettersOld Abkhasian Letters7.5 GlagoliticGlyph FormsOrderingPunctuation and DiacriticsNumerical Use of Letters7.6 ArmenianOrthographyUser CommunityPunctuationPreferred CharactersLigatures7.7 GeorgianScript FormsCase FormsFigure 7-7. Georgian Scripts and CasingMtavruli StylePunctuationHistoric Punctuation7.8 Modifier LettersCase and Modifier LettersGeneral CategoryBlocksCharacter NamesSpacing Modifier Letters: U+02B0–U+02FFPhonetic UsageEncoding PrinciplesSuperscript LettersSpacing Clones of DiacriticsRhotic HookTone LettersFigure 7-8. Tone LettersModifier Tone Letters: U+A700–U+A71F7.9 Combining MarksSequence of Base Letters and Combining MarksMultiple SemanticsGlyphic VariationOverlaid DiacriticsMarks as Spacing CharactersSpacing Clones of Diacritical MarksRelationship to ISO/IEC 8859-1Diacritics Positioned Over Two Base CharactersFigure 7-9. Double DiacriticsFigure 7-10. Positioning of Double DiacriticsFigure 7-11. Use of CGJ with Double DiacriticsDiacritics Positioned Over Three or More Base CharactersSubtending MarksCombining Marks with LigaturesFigure 7-12. Interaction of Combining Marks with LigaturesCombining Diacritical Marks: U+0300–U+036FStandardsUnderlining and OverliningCombining Diacritical Marks Extended: U+1AB0–U+1AFFCombining ParenthesesFigure 7-13. Positioning of Combining ParenthesesCombining Diacritical Marks Supplement: U+1DC0–U+1DFFCombining Marks for Symbols: U+20D0–U+20FFFigure 7-14. Use of Vertical Line Overlay for NegationEnclosing MarksCombining Half Marks: U+FE20–U+FE2FFigure 7-15. Double Diacritics and Half MarksCombining Marks in Other Blocks8 Europe-II8.1 Linear AEncodingStructureCharacter NamesDirectionalityNumbers8.2 Linear BLinear B Syllabary: U+10000–U+1007FStandardsLinear B Ideograms: U+10080–U+100FFAegean Numbers: U+10100–U+1013F8.3 Cypriot SyllabaryTable 8-1. Similar Characters in Linear B and Cypriot8.4 Ancient Anatolian AlphabetsLycian: U+10280–U+1029FCarian: U+102A0–U+102DFLydian: U+10920–U+1093FLycianCarianLydian8.5 Old ItalicDirectionality.Punctuation.Numerals.Glyphs.Figure 8-1. Distribution of Old Italic8.6 RunicThe Runic AlphabetDirectionRepresentative GlyphsUnificationsLong-Branch and Short-TwigStaveless RunesPunctuation MarksGolden NumbersEncoding8.7 Old HungarianStructureDirectionalityPunctuation and Numbers8.8 GothicDiacritics.Numerals.Punctuation.8.9 ElbasanStructureAccents and Other MarksNamesNumerals and Punctuation8.10 Caucasian AlbanianStructureAbbreviationsNumeralsPunctuation8.11 Old PermicStructureCombining LettersCombining MarksTable 8-2. Combining Marks Used in Old PermicNumeralsPunctuation8.12 OghamStructure.Rendering.Forfeda (Supplementary Characters)8.13 ShavianStructure.Collation9 Middle East-I9.1 HebrewHebrew: U+0590–U+05FFDirectionalityCursive.StandardsVowels and Other Pronunciation MarksShin and SinFinal (Contextual Variant) LetterformsYiddish DigraphsPunctuationCantillation MarksPositioningMetegAtnah Hafukh and Qamats QatanHolam Male and Holam HaserPuncta ExtraordinariaNun HafukhaCurrency SymbolAlphabetic Presentation Forms: U+FB1D–U+FB4FUse of Wide Letters9.2 ArabicArabic: U+0600–U+06FFFigure 9-1. Directionality and Cursive ConnectionDirectionalityStandardsEncoding PrinciplesPunctuationThe Non-joiner and the JoinerFigure 9-2. Using a JoinerFigure 9-3. Using a Non-joinerFigure 9-4. Combinations of Joiners and Non-joinersTashkil Nonspacing MarksFigure 9-5. Placement of HarakatArabic-Indic DigitsTable 9-1. Arabic Digit NamesTable 9-2. Glyph Variation in Eastern Arabic-Indic DigitsExtended Arabic LettersKoranic Annotation SignsAdditional Vowel MarksHonorificsArabic Mathematical SymbolsDate SeparatorFull StopCurrency SymbolsEnd of AyahOther Signs Spanning NumbersFigure 9-6. Arabic Year SignPoetic Verse SignArabic Cursive JoiningMinimum Rendering RequirementsJoining TypesTable 9-3. Primary Arabic Joining TypesTable 9-4. Derived Arabic Joining TypesJoining RulesTable 9-5. Arabic Glyph TypesArabic LigaturesLigature ClassesTable 9-6. Arabic Obligatory Ligature Joining GroupsLigature RulesTable 9-7. Arabic Ligature NotationOptional FeaturesArabic Joining GroupsDual-JoiningTable 9-8. Dual-Joining Arabic CharactersRight-JoiningTable 9-9. Right-Joining Arabic CharactersLetter hehLetter yehTable 9-10. Forms of the Arabic Letter yehNoon GhunnaCombining Hamza AboveTable 9-11. Arabic Letters With Hamza AboveJawiKurdishArabic Supplement: U+0750–U+077FMarwariArabic Extended-A: U+08A0–U+08FFArabic Presentation Forms-A: U+FB50–U+FDFFOrnate ParenthesesNuktasWord LigaturesArabic Presentation Forms-B: U+FE70–U+FEFFSpacing and Tatweel Forms of Arabic DiacriticsZero Width No-Break Space9.3 SyriacSyriac: U+0700–U+074FSyriac LanguageLanguages Using the Syriac Script.ShapingDirectionalitySyriac Type StylesCharacter NamesSyriac Abbreviation MarkFigure 9-7. Syriac AbbreviationFigure 9-8. Use of SAMLigatures and Combining CharactersDiacritical Marks and VowelsPunctuationDigitsHarklean MarksDalath and RishSemkathVowel MarksMiscellaneous Diacritics.Table 9-12. Miscellaneous Syriac Diacritic UseUse of Characters of the Arabic BlockSyriac ShapingMinimum Rendering RequirementsJoining TypesTable 9-13. Syriac Final Alaph Glyph TypesSyriac Character Joining GroupsTable 9-14. Dual-Joining Syriac CharactersTable 9-15. Right-Joining Syriac CharactersTable 9-16. Syriac Alaph Glyph FormsLigature ClassesTable 9-17. Syriac Ligatures9.4 SamaritanDirectionalityVowel SignsConsonant ModifiersPunctuationTable 9-18. Samaritan Performative Punctuation Marks9.5 MandaicLetter ItStructurePunctuationDirectionalityShaping and Layout BehaviorTable 9-19. Dual-Joining Mandaic CharactersTable 9-20. Right-Joining Mandaic CharactersLinebreaking10 Middle East-II10.1 Old North ArabianStructureOrderingNumbersPunctuation10.2 Old South ArabianDirectionalityStructureSegmentationMonogramsNumbersTable 10-1. Old South Arabian Numeric CharactersTable 10-2. Number Formation in Old South ArabianCharacter Names10.3 PhoenicianDirectionalityPunctuationStylistic VariationNumeralsCharacter Names10.4 Imperial AramaicDirectionalityPunctuationNumbersTable 10-3. Number Formation in Aramaic10.5 ManichaeanDirectionalityStructureShapingTable 10-4. Dual-Joining Manichaean LettersTable 10-5. Right-Joining Manichaean LettersTable 10-6. Left-Joining Manichaean LettersTable 10-7. Non-Joining Manichaean LettersTable 10-8. Manichaean LigaturesNumbersPunctuation10.6 Pahlavi and ParthianInscriptional Parthian: U+10B40–U+10B5FInscriptional Pahlavi: U+10B60–U+10B7FDirectionalityShaping and Layout BehaviorTable 10-9. Inscriptional Parthian Shaping BehaviorNumbersHeterogramsPsalter Pahlavi: U+10B80–U+10BAFStructureNumbersPunctuation10.7 AvestanDirectionalityShaping BehaviorTable 10-10. Avestan Shaping BehaviorPunctuation10.8 NabataeanStructureDirectionalityNumeralsPunctuation10.9 PalmyreneStructureDirectionalityNumeralsSymbolsPunctuation10.10 HatranStructureDirectionalityNumeralsPunctuation11 Cuneiform and Hieroglyphs11.1 Sumero-AkkadianCuneiform: U+12000–U+123FFEarly History of CuneiformGeographic RangeTable 11-1. Cuneiform Script UsageSources and CoverageSimple SignsComplex and Compound SignsMergers and SplitsFontsGlyph Variants Acquiring Independent Semantic StatusFormattingOrderingOther StandardsCuneiform Numbers and Punctuation: U+12400–U+1247FCuneiform PunctuationCuneiform NumeralsEarly Dynastic Cuneiform: U+12480–U+1254F11.2 UgariticVariant GlyphsOrdering.Character Names.11.3 Old PersianDirectionalityRepertoireNumeralsVariants11.4 Egyptian HieroglyphsStructureDirectionalityRenderingTable 11-2. Hieroglyphic Character SequenceFigure 11-1. Interpretation of Hieroglyphic MarkupHieratic FontsRepertoireCharacter NamesSign ClassificationEnclosuresNumerals11.5 MeroiticStructureDirectionalityShapingPunctuationSymbolsMeroitic Cursive Numbers11.6 Anatolian HieroglyphsStructureDirectionalityRepertoireAnnotationsPunctuationNumbersRendering12 South and Central Asia-I12.1 DevanagariDevanagari: U+0900–U+097FStandardsEncoding PrinciplesPrinciples of the Devanagari ScriptRendering Devanagari CharactersConsonant LettersIndependent Vowel LettersDependent Vowel Signs (Matras)Vowel LettersTable 12-1. Devanagari Vowel LettersVirama (Halant)Figure 12-1. Dead Consonants in DevanagariConsonant ConjunctsFigure 12-2. Conjunct Formations in DevanagariExplicit Virama (Halant)Figure 12-3. Preventing Conjunct Forms in DevanagariExplicit Half-ConsonantsFigure 12-4. Half-Consonants in DevanagariFigure 12-5. Independent Half-Forms in DevanagariFigure 12-6. Half-Consonants in OriyaConsonant FormsFigure 12-7. Consonant Forms in Devanagari and OriyaRendering DevanagariRules for RenderingNotationDead Consonant RuleConsonant RA RulesModifier Mark RulesLigature RulesMemory Representation and Rendering OrderFigure 12-8. Rendering Order in DevanagariSample Half-FormsTable 12-2. Sample Devanagari Half-FormsSample LigaturesTable 12-3. Sample Devanagari LigaturesLigature Forms for Ra + Vocalic LiquidsTable 12-4. RA + Vocalic Letter Ligature FormsSample Half-Ligature FormsTable 12-5. Sample Devanagari Half-Ligature FormsLanguage-Specific AllographsTable 12-6. Marathi and Nepali AllographsCombining MarksDevanagari Digits, Punctuation, and SymbolsDigitsPunctuationOther SymbolsExtensions in the Main Devanagari BlockSindhi LettersKonkaniBodo, Dogri, and MaithiliFigure 12-9. Use of Apostrophe in Bodo, Dogri and MaithiliFigure 12-10. Use of Avagraha in DogriKashmiri LettersLetters for Bihari LanguagesTable 12-7. Devanagari Vowels Used in Bihari LanguagesPrishthamatra OrthographyTable 12-8. Prishthamatra OrthographyDevanagari Extended: U+A8E0–U+A8FFCantillation Marks for the SZmavedaNasalization MarksEditorial MarksVedic Extensions: U+1CD0–U+1CFFTone MarksDiacritics for the Visarga.Nasalization MarksArdhavisarga12.2 Bengali (Bangla)Virama (Hasant)Vowel LettersTable 12-9. Bengali Vowel LettersTable 12-10. Diphthong Vowel Letters in KokborokTwo-Part Vowel SignsSpecial CharactersHistoric CharactersCharacters for AssameseTable 12-11. Assamese Consonant-Vowel CombinationsRendering BehaviorConsonant-Vowel LigaturesTable 12-12. Bengali Consonant-Vowel CombinationsFigure 12-11. Requesting Bengali Consonant-Vowel LigatureFigure 12-12. Blocking Bengali Consonant-Vowel LigatureKhiyaKhanda Ta.Figure 12-13. Bengali Syllable ttaYa-phalaaInteraction of Repha and Ya-phalaaPunctuationTruncationTable 12-13. Use of Apostrophe in Bangla12.3 GurmukhiEncoding PrinciplesVowel LettersTable 12-14. Gurmukhi Vowel LettersTonesOrderingRendering BehaviorTable 12-15. Gurmukhi ConjunctsTable 12-16. Additional Pairin and Addha Forms in GurmukhiTable 12-17. Use of Joiners in GurmukhiOther SymbolsPunctuation12.4 GujaratiVowel LettersTable 12-18. Gujarati Vowel LettersRendering BehaviorTable 12-19. Gujarati ConjunctsPunctuation12.5 Oriya (Odia)Special CharactersVowel LettersTable 12-20. Oriya Vowel LettersRendering BehaviorTable 12-21. Oriya ConjunctsConsonant FormsVowelsTable 12-22. Oriya Vowel PlacementOriya VA and WA.Punctuation and SymbolsTable 12-23. Ligation for the Syllable omFraction Characters12.6 TamilTamil: U+0B80–U+0BFFVirama (Pu!!i)Figure 12-14. Kssa Ligature in TamilRendering of the Tamil ScriptTamil VowelsIndependent Versus Dependent VowelsLeft-Side VowelsTable 12-24. Tamil Vowel ReorderingTwo-Part VowelsFigure 12-15. Tamil Two-Part VowelsTable 12-25. Tamil Vowel Splitting and ReorderingFigure 12-16. Vowel Reordering Around a Tamil ConjunctTamil LigaturesLigatures with Vowel iFigure 12-17. Tamil Ligatures with iLigatures with Vowel uTable 12-26. Tamil Ligatures with uFigure 12-18. Spacing Forms of Tamil uLigatures with raFigure 12-19. Tamil Ligatures with raLigatures with aa in Traditional Tamil OrthographyFigure 12-20. Traditional Tamil Ligatures with aaFigure 12-21. Traditional Tamil Ligatures with oLigatures with ai in Traditional Tamil OrthographyFigure 12-22. Traditional Tamil Ligatures with aiFigure 12-23. Vowel ai in Modern TamilTamil aythamPunctuationNumbersTamil Named Character SequencesTable 12-27. Tamil Vowels, Consonants, and Syllables12.7 TeluguVowel LettersTable 12-28. Telugu Vowel LettersRendering BehaviorNakZra-PolluTable 12-29. Rendering of Telugu na + viramaRephSpecial CharactersFractionsPunctuation12.8 KannadaKannada: U+0C80–U+0CFFPrinciples of the Kannada ScriptVowel LettersTable 12-30. Kannada Vowel LettersConsonant ConjunctsSpecial CharactersKannada Letter LLLAFigure 12-24. Indicating Retroflexion in Badaga VowelsRendering KannadaExplicit Virama (Halant)Vowelless NATable 12-31. Rendering of Kannada na + viramaConsonant Clusters Involving RAJihvamuliya and UpadhmaniyaModifier Mark RulesAvagraha SignPunctuation12.9 MalayalamVowel LettersTable 12-32. Malayalam Vowel LettersTwo-Part VowelsHistoric CharactersMalayalam Orthographic ReformTable 12-33. Malayalam Orthographic ReformRendering MalayalamCandrakkalaTable 12-34. Malayalam ConjunctsTable 12-35. Candrakkala ExamplesExplicit CandrakkalaRequesting Traditional LigaturesRequesting Open Forms of ConjunctsTable 12-36. Use of Joiners in MalayalamAnusvaraChillu FormsSpecial Cases Involving rraTable 12-37. Malayalam /rara/ and /uua/Table 12-38. Malayalam /nr/ and /nt/Dot RephLegacy Chillu SequencesTable 12-39. Atomic Encoding of Malayalam ChillusMalayalam Numbers and PunctuationArchaic NumbersDate MarkPunctuation13 South and Central Asia-II13.1 ThaanaDirectionalityVowelsTable 13-1. Thaana Glyph PlacementNumeralsPunctuationCharacter Names and Arrangement13.2 SinhalaVowel LettersTable 13-2. Sinhala Vowel LettersOther Letters for Tamil.PunctuationDigitsSinhala Archaic Numbers: U+111E0–U+111FF13.3 TibetanGeneral Principles of the Tibetan ScriptFigure 13-1. Tibetan Syllable StructureConsonantsVowelsCoding OrderAllographical ConsiderationsHead Position “ra”Full-Form “ra” in Head Position.Subjoined Position “wa”, “ya”, and “ra”Halanta (Srog-Med).Line Breaking ConsiderationsTibetan PunctuationSvasti SignsOther CharactersTibetan Half-NumbersTibetan Transliteration and Transcription of Other LanguagesOther SignsTraditional Text Formatting and Line JustificationFigure 13-2. Justifying Tibetan TseksTibetan Shorthand Abbreviations (bskungs-yig) and Limitations of the Encoding13.4 MongolianHistoryDirectionalityEncoding PrinciplesFigure 13-3. Mongolian Glyph ConvergenceCursive JoiningFigure 13-4. Mongolian Consonant LigationFigure 13-5. Mongolian Positional FormsFree Variation SelectorsFigure 13-6. Mongolian Free Variation SelectorRepresentative GlyphsVowel HarmonyFigure 13-7. Mongolian Gender FormsNarrow No-Break SpaceMongolian Vowel SeparatorFigure 13-8. Mongolian Vowel SeparatorNumbersPunctuationNiruguSyllable Boundary Marker13.5 LimbuConsonantsVowelsVowel LengthGlottalizationCollating OrderGlyph PlacementTable 13-3. Positions of Limbu Combining CharactersPunctuationDigits13.6 Meetei MayekStructureVowel LettersFinal ConsonantsAbbreviationsOrderPunctuationDigits13.7 MroStructureCharacter NamesDigitsMro has a script-specific set of digitsPunctuation13.8 Warang CitiStructureDigits and NumbersPunctuation13.9 Ol ChikiStructureDigitsPunctuationModifier LettersGlottalizationAspirationLigatures13.10 ChakmaIndependent VowelsVowel Killer and ViramaChakma FontsPunctuationDigits13.11 LepchaStructureVowelsMedialsRetroflex ConsonantsOrdering of Syllable ComponentsTable 13-4. Lepcha Syllabic StructureRenderingDigitsPunctuation13.12 SaurashtraGlyph PlacementDigitsPunctuationSaurashtra Consonant Sign Haaru14 South and Central Asia-III14.1 BrahmiEncoding ModelVowel LettersTable 14-1. Brahmi Vowel LettersRendering BehaviorFigure 14-1. Consonant Ligatures in BrahmiVowel ModifiersOld Tamil BrahmiBhattiprolu BrahmiPunctuationNumeralsTable 14-2. Brahmi Positional Digits14.2 KharoshthiKharoshthi: U+10A00–U+10A5FFigure 14-2. Geographical Extent of the Kharoshthi ScriptDirectionalityDiacritical Marks and VowelsNumeralsFigure 14-3. Kharoshthi Number 1996PunctuationWord Breaks, Line Breaks, and HyphenationSortingRendering KharoshthiFigure 14-4. Kharoshthi Rendering ExampleCombining VowelsTable 14-3. Kharoshthi Vowel SignsCombining Vowel ModifiersTable 14-4. Kharoshthi Vowel ModifiersCombining Consonant ModifiersTable 14-5. Kharoshthi Consonant ModifiersViramaTable 14-6. Examples of Kharoshthi Virama14.3 Phags-paHistoryBasic StructureSyllable DivisionCandrabinduFigure 14-5. Phags-pa Syllable OmAlternate LettersNumbersPunctuationPositional VariantsTable 14-7. Phags-pa Positional Forms of I, U, E, and OMirrored VariantsTable 14-8. Contextual Glyph Mirroring in Phags-paTable 14-9. Phags-pa Standardized VariantsFigure 14-6. Phags-pa Reversed ShapingCursive Joining14.4 Old TurkicStructureDirectionalityPunctuation15 South and Central Asia-IV15.1 Syloti NagriVirama and ConjunctsDigitsPunctuationPoetry Marks15.2 KaithiStandardsStylesRendering BehaviorVowel LettersConsonant ConjunctsRuled LinesNuktaPunctuationDigits15.3 SharadaRendering BehaviorRuled LinesViramaCandrabindu and AvagrahaJihvamuliya and UpadhmaniyaPunctuationDigits15.4 TakriVowel LettersTable 15-1. Takri Vowel LettersConsonant ConjunctsNuktaHeadlinesPunctuationFractions15.5 SiddhamNuktaVirama and ConjunctsFigure 15-1. Siddham Consonant ClusterHead MarksRepetition MarksSection SignsPunctuationTable 15-2. Siddham Punctuation Characters15.6 MahajaniStructureDigitsOther SymbolsPunctuation15.7 KhojkiStructurePunctuationDigits15.8 KhudawadiStructureVowel LettersTable 15-3. Khudawadi Vowel LettersConsonant ConjunctsNasalizationNuktaTable 15-4. Representation of Arabic Sounds in KhudawadiPunctuationDigits15.9 MultaniStructureDigitsPunctuation15.10 TirhutaStructureVowelsTable 15-5. Tirhuta Vowel LettersConsonantsViramaNasalizationCharacters for Representing SanskritNuktaPunctuationSpecial SignsNumbers15.11 ModiStructureVowel LettersTable 15-6. Modi Vowel LettersRenderingConsonant Clusters Involving raFigure 15-2. Modi Shaping for raPunctuation and Word BoundariesVarious SignsNumbers15.12 GranthaRendering BehaviorConsonant ClustersFigure 15-3. Splitting Large Conjunct Stacks in GranthaViramaTable 15-7. Rendering of Explicit Virama Forms in GranthaVowelsSignsCantillation MarksTable 15-8. Additional Svara Marks used in GranthaPunctuationNumbers15.13 AhomStructureVowelsSyllabic StructureNumeralsPunctuationVariant Forms15.14 Sora SompengEncoding StructureCharacter NamesPunctuationLinebreaking16 Southeast Asia16.1 ThaiStandards.Encoding Principles.Table 16-1. Glyph Positions in Thai SyllablesRendering of Thai Combining MarksThai PunctuationSpacingThai Transcription of Pali and SanskritPatani Malay16.2 LaoEncoding PrinciplesPunctuationGlyph PlacementTable 16-2. Glyph Positions in Lao SyllablesAdditional LettersRendering of Lao Combining MarksLao Aspirated Nasals16.3 MyanmarMyanmar: U+1000–U+109FStandardsEncoding PrinciplesComposite CharactersEncoding SubrangesConjunctsKinziMedial ConsonantsAsatContractionsGreat saTall aaOrdering of Syllable ComponentsTable 16-3. Modern Burmese Syllabic StructureSpacing.Myanmar Extended-A: U+AA60–U+AA7FKhamti ShanConsonantsVowelsTonesTable 16-4. Khamti Shan Tone MarksDigitsOther SymbolsSubjoined CharactersHistorical Khamti ShanAiton and PhakeConsonantsSubjoined ConsonantsVowelsLigaturesTonesMyanmar Extended-B: U+A9E0–U+A9FF16.4 KhmerKhmer: U+1780–U+17FFPrinciples of the Khmer ScriptGlottal ConsonantTable 16-5. Independent Khmer Vowel CharactersSubscript ConsonantsSubscript Independent Vowel SignsConsonant RegistersTable 16-6. Two Registers of Khmer ConsonantsEncoding PrinciplesSubscript Consonant SignsTable 16-7. Khmer Subscript Consonant SignsDependent Vowel SignsTable 16-8. Khmer Composite Dependent Vowel Signs with NikahitIndependent Vowel CharactersSubscript Independent Vowel SignsTable 16-9. Khmer Subscript Independent Vowel SignsOther Signs as Syllabic ComponentsLigaturesFigure 16-1. Common Ligatures in KhmerMultiple GlyphsFigure 16-2. Common Multiple Forms in KhmerCharacters Whose Use Is DiscouragedOrdering of Syllable Components.Figure 16-3. Examples of Syllabic Order in KhmerConsonant ShiftersLigature ControlFigure 16-4. Ligation in Muul Style in KhmerSpacing.Khmer Symbols: U+19E0–U+19FFSymbols16.5 Tai LeTable 16-10. Tai Le Tone MarksDigits.Table 16-11. Myanmar DigitsPunctuation.16.6 New Tai LueStructureVisual OrderTwo-Part VowelsTable 16-12. New Tai Lue Vowel PlacementFinal ConsonantsTonesTable 16-13. New Tai Lue Registers and TonesDigits16.7 Tai ThamConsonantsIndependent VowelsDependent Consonant SignsDependent Vowel SignsTone MarksOther Combining MarksDigitsPunctuationCollating OrderLinebreaking16.8 Tai VietStructureVisual OrderTone Classes and Tone MarksFinal ConsonantsSymbols and PunctuationTable 16-14. Tai Viet Symbols and PunctuationWord SpacingCollating Order16.9 Kayah LiStructureVowelsTonesDigitsPunctuation16.10 ChamStructureIndependent Vowel LettersConsonantsOrdering of Syllable ComponentsTable 16-15. Cham Syllabic StructureDigitsPunctuationLine Breaking16.11 Pahawh HmongCharacter NamesStructureFigure 16-5. Pahawh Hmong Syllable StructureVowelsConsonantsCombining MarksPunctuation and Other SymbolsDigits and NumbersLogographs16.12 Pau Cin HauStructureDigitsPunctuation17 Indonesia and Oceania17.1 Philippine ScriptsTagalog: U+1700–U+171FHanunóo: U+1720–U+173FBuhid: U+1740–U+175FTagbanwa: U+1760–U+177FPrinciples of the Philippine ScriptsConsonant Letters.Independent Vowel Letters.Dependent Vowel Signs.Virama.Directionality.Rendering.Table 17-1. Hanunóo and Buhid Vowel Sign CombinationsPunctuation.17.2 BugineseRepertoireStructureLigatureFigure 17-1. Buginese LigatureOrderPunctuationNumerals17.3 BalineseStructureTable 17-2. Balinese Base Consonants and Conjunct FormsTable 17-3. Sasak Extensions for BalineseBehavior of raFigure 17-2. Writing dharma in BalineseBehavior of ra repaRenderingTable 17-4. Balinese Consonant Clusters with u and u:NuktaOrderingPunctuationHyphenationMusical SymbolsModre Symbols17.4 JavaneseConsonantsIndependent VowelsDependent VowelsFigure 17-3. Representation of Javanese Two-Part VowelsConsonant SignsRenderingDigitsPunctuationReduplicationOrdering of Syllable ComponentsLinebreaking17.5 RejangStructureRenderingOrderingDigitsPunctuation17.6 BatakStructureRenderingPunctuationLinebreaking17.7 SundaneseSundanese: U+1B80–U+1BBFStructureMedialsFinal ConsonantsCombining MarksHistoric CharactersAdditional ConsonantsDigitsPunctuationOrderingOrdering of Syllable ComponentsTable 17-5. Modern Sundanese Syllabic StructureRenderingSundanese Supplement: U+1CC0–U+1CCF18 East Asia18.1 HanCJK Unified IdeographsBlocks Containing Han IdeographsTable 18-1. Blocks Containing Han IdeographsTable 18-2. Small Extensions to the UROIICoreGeneral Characteristics of Han IdeographsTable 18-3. Common Han CharactersTerminologyDistinguishing Han Character Usage Between LanguagesFigure 18-1. Han SpellingFigure 18-2. Semantic Context for Han CharactersSimplified and Traditional ChineseDialects and Early Forms of ChineseSorting Han Ideographs.Character GlyphsPrinciples of Han UnificationThree-Dimensional Conceptual ModelFigure 18-3. Three-Dimensional Conceptual ModelUnification RulesFigure 18-4. CJK Source SeparationTable 18-4. Source Encoding for Sword VariantsFigure 18-5. Not Cognates, Not UnifiedAbstract ShapeTwo-Level ClassificationIdeographic Component StructureFigure 18-6. Ideographic Component StructureFigure 18-7. The Most Superior Node of an Ideographic ComponentIdeograph FeaturesUniqueness or UnificationSpatial PositioningExamplesTable 18-5. Ideographs Not UnifiedTable 18-6. Ideographs UnifiedHan Ideograph ArrangementTable 18-7. Han Ideograph ArrangementRadical-Stroke IndicesMappings for Han IdeographsCJK Unified Ideographs Extension B: U+20000–U+2A6D6CJK Unified Ideographs Extension C: U+2A700–U+2B734CJK Unified Ideographs Extension D: U+2B740–U+2B81DCJK Unified Ideographs Extension E: U+2B820–U+2CEA1CJK Compatibility Ideographs: U+F900–U+FAFFCJK Compatibility Supplement: U+2F800–U+2FA1DKanbun: U+3190–U+319FSymbols Derived from Han IdeographsCJK and KangXi Radicals: U+2E80–U+2FD5Standards.Semantics.CJK Additions from HKSCS and GB 18030CJK Strokes: U+31C0–U+31EF18.2 Ideographic Description CharactersApplicability to Other ScriptsIdeographic Description SequencesFigure 18-8. Using the Ideographic Description CharactersEquivalence.Interaction with the Ideographic Variation Mark.Rendering.Character Boundaries.Standards.18.3 BopomofoStandardsMandarin Tone MarksTable 18-8. Mandarin Tone MarksStandard Mandarin BopomofoExtended Bopomofo.Extended Bopomofo Tone Marks.Table 18-9. Minnan and Hakka Tone MarksRendering of Bopomofo.18.4 Hiragana and KatakanaHiragana: U+3040–U+309FStandardsCombining MarksIteration MarksVertical Text DigraphKatakana: U+30A0–U+30FFStandardsPunctuation-like CharactersVertical Text DigraphKatakana Phonetic Extensions: U+31F0–U+31FFStandardsKana Supplement U+1B000–U+1B0FFFigure 18-9. Japanese Historic Kana for e and ye18.5 Halfwidth and Fullwidth FormsUnifications18.6 HangulHangul Jamo: U+1100–U+11FFHangul Jamo Extended-A: U+A960–U+A97FHangul Jamo Extended-B: U+D7B0–U+D7FFHangul Compatibility Jamo: U+3130–U+318FStandardsNormalizationTable 18-10. Separating Jamo CharactersHangul Syllables: U+AC00–U+D7A3StandardsEquivalenceHangul Syllable CompositionHangul Syllable DecompositionHangul Syllable NameHangul Syllable Representative GlyphTable 18-11. Line-Based Placement of JungseongCollation18.7 YiTraditional Yi ScriptStandardized Yi ScriptStandardsNaming Conventions and OrderYi Syllable Iteration MarkPunctuationRenderingYi Radicals18.8 LisuStructureTone LettersTable 18-12. Lisu Tone LettersOther Modifier LettersDigits and SeparatorsPunctuationTable 18-13. Punctuation Adopted in Lisu OrthographyLinebreakingWord Separation18.9 MiaoEncoding PrinciplesTone MarksRendering of “wart”OrderingDigitsPunctuation19 Africa19.1 EthiopicEthiopic: U+1200–U+137FBasic and Extended Ethiopic.Encoding Principles.Variant Glyph Forms.Labialized Subseries.Table 19-1. Labialized Forms in Ethiopic -WAATable 19-2. Labialized Forms in Ethiopic -WEKeyboard Input.Syllable Names.Encoding Order and Sorting.Word Separators.Section MarkDiacritical Marks.Numbers.Ethiopic Extensions19.2 OsmanyaStructureOrderingCharacter Names and Glyphs19.3 TifinaghHistorySource StandardsOrderingDirectionalityDiacritical Marks.Contextual ShapingFigure 19-1. Tifinagh Contextual ShapingBi-ConsonantsFigure 19-2. Tifinagh Consonant Joiner and Bi-consonants19.4 N’KoCharacter Names and Block NameStructureDiacritical MarksTable 19-3. N’Ko Diacritic UsageTable 19-4. N’Ko Tone Diacritics on VowelsDigitsOrdinal NumbersFigure 19-3. Examples of N’Ko OrdinalsPunctuationOrderingRenderingTable 19-5. N’Ko Letter Shaping19.5 VaiSourcesBasic StructureHistoric SyllablesLogogramsDigitsPunctuationSegmentationOrdering19.6 BamumBamum: U+A6A0–U+A6FFStructureDiacritical MarksPunctuationDigitsBamum Supplement: U+16800–U+16A3F19.7 Bassa VahStructurePunctuation and Digits19.8 Mende KikakuiStructureDirectionalityNumbersTable 19-6. Number Formation in Mende Kikakui20 Americas20.1 CherokeeStructureCasingTones.InputNumbers.Punctuation.Standards.20.2 Canadian Aboriginal SyllabicsCanadian Aboriginal Syllabics: U+1400–U+167FOrganizationArrangementExtensionsPunctuation and SymbolsCanadian Aboriginal Syllabics Extended: U+18B0–U+18FF20.3 DeseretLetter Names and Shapes.Structure.Sorting.Typographic Conventions.Figure 20-1. Short Words Equivalent to Deseret Letter NamesPhonetics.Table 20-1. IPA Transcription of Deseret21 Notational Systems21.1 BrailleExampleUsage Model.Imaging.Script21.2 Western Musical SymbolsGlyphsSymbols in Other BlocksProcessing.Input Methods.Directionality.Figure 21-1. Examples of Specialized Music LayoutFormat Characters.Precomposed Note Characters.Figure 21-2. Precomposed Note CharactersAlternative Noteheads.Figure 21-3. Alternative NoteheadsAugmentation Dots and Articulation Symbols.Figure 21-4. Augmentation Dots and Articulation SymbolsOrnamentation.Table 21-1. Examples of OrnamentationGregorianKievan21.3 Byzantine Musical SymbolsProcessing.21.4 Ancient Greek Musical NotationUnificationTable 21-2. Representation of Ancient Greek Vocal and Instrumental NotationNaming ConventionsFontCombining Marks21.5 DuployanStructureShorthand Format Characters21.6 Sutton SignWritingStructureRepertoireModifiersPunctuation22 Symbols22.1 Currency SymbolsUnificationFigure 22-1. Alternative Glyphs for Dollar SignFonts.Table 22-1. Currency Symbols Encoded in Other BlocksLira SignDollar and PesoYen and YuanEuro SignIndian Rupee SignTurkish Lira SignRuble SignLari SignOther Currency Symbols22.2 Letterlike SymbolsLetterlike Symbols: U+2100–U+214FNumero SignFigure 22-2. Alternative Glyphs for Numero SignUnit SymbolsCompatibilityStylesStandardsMathematical Alphanumeric Symbols: U+1D400–U+1D7FFWords Used as Variables.Mathematical AlphabetsBasic Set of Alphanumeric Characters.Additional Characters.Dotless CharactersFigure 22-3. Wide Mathematical AccentsSemantic Distinctions.Figure 22-4. Style Variants and Semantic Distinctions in MathematicsMathematical Alphabets.Table 22-2. Mathematical Alphanumeric SymbolsCompatibility Decompositions.Fonts Used for Mathematical AlphabetsFrakturMath ItalicsFigure 22-5. Easily Confused Shapes for Mathematical GlyphsHard-to-Distinguish Letters.Font Support for Combining Diacritics.Type Style for Script Characters.Double-Struck Characters.Arabic Mathematical Alphabetic Symbols: U+1EE00–U+1EEFFShapingLarge OperatorsProperties22.3 NumeralsEncoding PrinciplesDecimal DigitsTable 22-3. Script-Specific Decimal DigitsExceptionsCJK Ideographs Used as Decimal DigitsFigure 22-6. CJK Ideographic NumbersOther DigitsHexadecimal DigitsCompatibility DigitsTable 22-4. Compatibility DigitsParsing of Superscript and Subscript DigitsNumeric BulletsGlyph Variants of Decimal DigitsFigure 22-7. Regular and Old Style DigitsAccounting NumbersNon-Decimal Radix SystemsEthiopic NumeralsCuneiform NumeralsOther Ancient Numeral SystemsAcrophonic Systems and Other Letter-based NumbersRoman NumeralsGreek NumeralsCoptic Epact Numbers: U+102E0–U+102FFRumi Numeral Forms: U+10E60–U+10E7ECJK NumeralsCJK Ideographic Traditional NumeralsChinese Counting-Rod NumeralsSuzhou-Style NumeralsFractionsFigure 22-8. Alternate Forms of Vulgar FractionsCommon Indic Number Forms: U+A830–U+A83F22.4 Superscript and Subscript SymbolsSuperscripts and Subscripts: U+2070–U+209FParsing of Superscript and Subscript DigitsStandardsSuperscripts and Subscripts in Other Blocks22.5 Mathematical SymbolsSemantics.Mathematical PropertyMathematical Operators: U+2200–U+22FFStandardsEncoding PrinciplesUnificationsDisunificationsTable 22-5. Mathematical Operators Disunified from PunctuationGreek-Derived SymbolsN-ary OperatorsInvisible OperatorsMinus SignDelimitersBidirectional LayoutOther Elements of Mathematical NotationSupplements to Mathematical Symbols and ArrowsStandards.Supplemental Mathematical Operators: U+2A00–U+2AFFMiscellaneous Mathematical Symbols-A: U+27C0–U+27EFMathematical Brackets.Long DivisionFractional Slash and Other DiagonalsMiscellaneous Mathematical Symbols-B: U+2980–U+29FFWiggly Fence.Miscellaneous Symbols and Arrows: U+2B00–U+2B7FArrows: U+2190–U+21FFBidirectional LayoutStandardsUnificationsSupplemental ArrowsLong Arrows.Standardized Variants of Mathematical SymbolsChange in Representative Glyphs for U+2278 and U+227922.6 Invisible Mathematical OperatorsInvisible SeparatorInvisible MultiplicationInvisible PlusInvisible Function Application22.7 Technical SymbolsControl Pictures: U+2400–U+243FCode Points for Pictures for Control CodesPictures for ASCII SpaceStandardsMiscellaneous Technical: U+2300–U+23FFKeytop Labels.Floor and CeilingCrops and Quine CornersFigure 22-9. Usage of Crops and Quine CornersAngle Brackets.APL Functional SymbolsSymbol Pieces.Table 22-6. Use of Mathematical Symbol PiecesHorizontal BracketsTerminal Graphics Characters.Decimal Exponent SymbolFigure 22-10. Usage of the Decimal Exponent SymbolDental Symbols.Metrical SymbolsElectrotechnical SymbolsUser Interface SymbolsStandards.Optical Character Recognition: U+2440–U+245FStandards22.8 Geometrical SymbolsBox Drawing and Block ElementsBox DrawingBlock ElementsStandardsGeometric Shapes: U+25A0–U+25FFHatched SquaresLozengeUse in MathematicsStandardsGeometric Shapes Extended: U+1F780–U+1F7FFTable 22-7. Geometric Shape Collections22.9 Miscellaneous SymbolsRendering of Emoji SymbolsColor Words in Unicode Character NamesMiscellaneous Symbols and PictographsStandardsWeather SymbolsTraffic SignsDictionary and Map SymbolsPlastic Bottle Material Code System.Recycling Symbol for Generic Materials.Universal Recycling Symbol.Paper Recycling Symbols.Gender SymbolsGenealogical SymbolsGame SymbolsAnimal SymbolsCultural SymbolsHand SymbolsEmoji ModifiersMiscellaneous Symbols in Other BlocksEmoticons: U+1F600–U+1F64FTransport and Map Symbols: U+1F680–U+1F6FFDingbats: U+2700–U+27BFUnifications and Additions.Ornamental Brackets.Ornamental Dingbats: U+1F650–U+1F67FAlchemical Symbols: U+1F700–U+1F77FMahjong Tiles: U+1F000–U+1F02FDomino Tiles: U+1F030–U+1F09FPlaying Cards: U+1F0A0–U+1F0FFYijing Hexagram Symbols: U+4DC0–U+4DFFTai Xuan Jing Symbols: U+1D300–U+1D356MonogramsDigramsTetragramsAncient Symbols: U+10190–U+101CFPhaistos Disc Symbols: U+101D0–U+101FFDirectionality22.10 Enclosed and SquareEnclosed SymbolsSquare SymbolsSource StandardsAllocationDecompositionCasingEnclosed Alphanumerics: U+2460–U+24FFEnclosed CJK Letters and Months: U+3200–U+32FFCJK Compatibility: U+3300–U+33FFJapanese Era NamesTable 22-8. Japanese Era NamesEnclosed Alphanumeric Supplement: U+1F100–U+1F1FFRegional Indicator SymbolsEnclosed Ideographic Supplement: U+1F200–U+1F2FF23 Special Areas and Format Characters23.1 Control CodesRepresenting Control SequencesEscape SequencesSpecification of Control Code SemanticsTable 23-1. Control Codes Specified in the Unicode StandardNewline Function23.2 Layout ControlsLine and Word BreakingNo-Break SpaceWord JoinerZero Width No-Break SpaceZero Width SpaceTable 23-2. Letter SpacingZero-Width Spaces and Joiner CharactersHyphenation.Line and Paragraph SeparatorCursive Connection and LigaturesJoinerNon-joinerCursive ConnectionFigure 23-1. Prevention of JoiningFigure 23-2. Exhibition of Joining Glyphs in IsolationExamples.Figure 23-3. Effect of Intervening JoinersTransparencyJoiner and Non-joiner in Indic ScriptsImplementation Notes.Filtering Joiner and Non-joinerCombining Grapheme JoinerBlocking ReorderingCGJ and CollationRenderingCGJ and Joiner CharactersBidirectional Ordering ControlsTable 23-3. Bidirectional Ordering ControlsStateful Format ControlsTable 23-4. Paired Stateful ControlsTable 23-5. Paired Stateful Controls (Deprecated)23.3 Deprecated Format CharactersSymmetric SwappingCharacter Shaping SelectorsNumeric Shape Selectors23.4 Variation SelectorsVariation SequenceCJK Compatibility IdeographsMongolian23.5 Private-Use CharactersProperties.Normalization.Private Use Area: U+E000–U+F8FFEncoding Structure.Corporate Use Subarea.End-User Subarea.Allocation of Subareas.Supplementary Private Use AreasEncoding Structure.23.6 Surrogates AreaHigh-SurrogateLow-SurrogatePrivate-Use High-Surrogates23.7 NoncharactersU+FFFF and U+10FFFFU+FFFE23.8 SpecialsByte Order Mark (BOM): U+FEFFTable 23-6. Unicode Encoding Scheme SignaturesTable 23-7. U+FEFF Signature in Other CharsetsSpecials: U+FFF0–U+FFF8Annotation Characters: U+FFF9–U+FFFBFigure 23-4. Annotation CharactersConformanceUse in Plain TextLexical RestrictionsFormattingInputCollationBidirectional TextReplacement Characters: U+FFFC–U+FFFDU+FFFCU+FFFD23.9 Tag CharactersTag Characters: U+E0000–U+E007FDeprecated Use for Language TaggingSyntax for Embedding TagsTag Identification.Tag Termination.Language Tags.Tag Scope and Nesting.Figure 23-5. Tag CharactersCanceling Tag Values.Working with Language TagsAvoiding Language Tags.Higher-Level Protocols.Effect of Tags on Interpretation of Text.Display.Processing.Range Checking for Tag Characters.Editing and Modification.Dangers of Incomplete Support.Unicode Conformance IssuesFormal Tag Syntax24 About the Code Charts24.1 Character Names ListImages in the Code Charts and Character ListsFontsAlternative FormsOrientationSpecial Characters and Code PointsCombining CharactersDashed Box ConventionReserved CharactersNoncharactersDeprecated CharactersCharacter NamesInformative AliasesNormative AliasesCross ReferencesExplicit InequalityOther Linguistic RelationshipsInformation About LanguagesCase MappingsDecompositionsStandardized Variation SequencesSubheads24.2 CJK IdeographsCJK Unified IdeographsTable 24-1. IRG SourcesChart for the Main CJK BlockFigure 24-1. CJK Chart Format for the Main CJK BlockCharts for CJK ExtensionsFigure 24-2. CJK Chart Format for CJK Extension AFigure 24-3. CJK Chart Format for CJK Extension BCompatibility IdeographsFigure 24-4. CJK Chart Format for Compatibility Ideographs24.3 Hangul SyllablesA Notational ConventionsCode PointsCharacter NamesCharacter BlocksSequencesRenderingFigure A-1. Example of RenderingProperties and Property ValuesMiscellaneousExtended BNFTable A-1. Extended BNFCharacter ClassesTable A-2. Character Class ExamplesOperatorsTable A-3. OperatorsB Unicode Publications and ResourcesB.1 The Unicode ConsortiumThe Unicode Technical CommitteeOther ActivitiesB.2 Unicode PublicationsB.3 Unicode Technical StandardsUTS #6: A Standard Compression Scheme for UnicodeUTS #10: Unicode Collation AlgorithmUTS #18: Unicode Regular ExpressionsUTS #22: Character Mapping Markup Language (CharMapML)UTS #35: Unicode Locale Data Markup Language (LDML)UTS #37: Unicode Ideographic Variation DatabaseUTS #39: Unicode Security MechanismsUTS #46: Unicode IDNA Compatibility ProcessingB.4 Unicode Technical ReportsUTR #16: UTF-EBCDICUTR #17: Unicode Character Encoding ModelUTR #20: Unicode in XML and Other Markup LanguagesUTR #23: The Unicode Character Property ModelUTR #25: Unicode Support for MathematicsUTR #26: Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8)UTR #33: Unicode Conformance ModelUTR #36: Unicode Security ConsiderationsUTR #50: Unicode Vertical Text LayoutUTR #51: Unicode EmojiB.5 Unicode Technical NotesB.6 Other Unicode Online ResourcesUnicode Online ResourcesUnicode Web SiteUnicode Anonymous FTP SiteChartsCharacter IndexConferencesE-mail Discussion ListFAQ (Frequently Asked Questions)GlossaryOnline Unicode Character DatabaseOnline Unihan DatabasePoliciesUnicode Common Locale Data Repository (CLDR)Updates and ErrataVersionsWhere Is My Character?How to Contact the Unicode ConsortiumC Relationship to ISO/IEC 10646C.1 HistoryTable C-1. TimelineC.2 Encoding Forms in ISO/IEC 10646UCS-4UCS-2Zero ExtendingTable C-2. Zero ExtendingC.3 UTF-8 and UTF-16UTF-8UTF-16C.4 Synchronization of the StandardsC.5 Identification of Features for UnicodeC.6 Character NamesC.7 Character Functional SpecificationsD Changes from Previous VersionsD.1 Versions of the Unicode StandardTable D-1. Versions of Unicode and ISO/IEC 10646Table D-2. Allocation of Code Points by Type (Versions 1.0.0 to 3.0)Table D-3. Allocation of Code Points by Type (Versions 3.1 to 5.1)Table D-4. Allocation of Code Points by Type (Versions 5.2 to 7.0)Table D-5. Allocation of Code Points by Type (Version 8.0)D.2 Clause and Definition UpdatesTable D-6. Version 6.1 Clause and Definition UpdatesTable D-7. Version 6.3 Clause and Definition UpdatesE Han Unification HistoryE.1 Development of the UROE.2 Ideographic Rapporteur GroupE.3 CJK SourcesTable E-1. G Source DocumentationTable E-2. H Source DocumentationTable E-3. M Source DocumentationTable E-4. T Source DocumentationTable E-5. J Source DocumentationTable E-6. K Source DocumentationTable E-7. KP Source DocumentationTable E-8. V Source DocumentationTable E-9. U Source DocumentationOmission of Repertoire for Some SourcesF Documentation of CJK StrokesTable F-1. CJK StrokesReferencesR.1 Source Standards and SpecificationsR.2 Source Dictionaries for HanR.3 Other Script SourcesR.4 Selected Resources: TechnicalR.5 Selected Resources: OtherGeneral IndexABCDEFGHIJKLMNOPQRSTUVWXYZ
This page contains hyperlinks toThe Unicode Standard, Version 8.0. TheUnicode 8.0.0 page lists the contents with links to each PDF file.