Unicode 6.0 Web BookmarksAbout this pageThis page contains hyperlinks toThe Unicode Standard, Version 6.0. TheUnicode 6.0.0 page lists the contents with links to each PDF file.PrefaceWhy Unicode?What's New?Support for Languages and Symbol SetsConformance UpdatesImportant Property and Behavioral UpdatesBlock DescriptionsData File Descriptions and UpdatesStability Policy UpdatesCJKIndustry StandardsDetailed Change InformationOrganization of This StandardConcepts, Architecture, Conformance, and GuidelinesCharacter Block DescriptionsCode ChartsAppendicesReferences and IndexGlossary and Character IndexUnicode Standard AnnexesThe Unicode Character DatabaseUnicode Code ChartsUnicode Technical Standards and Unicode Technical ReportsUpdates and ErrataAcknowledgements1 IntroductionFigure 1-1. Wide ASCII1.1 CoverageStandards CoverageNew Characters1.2 Design GoalsFigure 1-2. Unicode Compared to the 2022 Framework1.3 Text HandlingCharacters and GlyphsText Elements2 General Structure2.1 Architectural ContextBasic Text ProcessesText Elements, Characters, and Text ProcessesFigure 2-1. Text Elements and CharactersText Processes and EncodingCharacter Identity2.2 Unicode Design PrinciplesTable 2-1. The 10 Unicode Design PrinciplesUniversalityEfficiencyCharacters, Not GlyphsFigure 2-2. Characters Versus GlyphsTable 2-2. User-Perceived Characters with Multiple Code PointsFigure 2-3. Unicode Character Code to Rendered GlyphsSemanticsPlain TextLogical OrderFigure 2-4. Bidirectional OrderingFigure 2-5. Writing Direction and NumbersUnificationFigure 2-6. Typeface Variation for the Bone CharacterDynamic CompositionFigure 2-7. Dynamic CompositionEquivalent SequencesStabilityConvertibility2.3 Compatibility CharactersUsageAllocationCompatibility VariantsCompatibility Decomposable CharactersCompatibility Character Versus Compatibility Decomposable Character2.4 Code Points and CharactersFigure 2-8. Abstract and Encoded CharactersTypes of Code PointsTable 2-3. Types of Code PointsControl CodesNoncharactersPrivate UseSurrogatesRestricted InterchangeCode Point Semantics2.5 Encoding FormsNon-overlapFigure 2-9. Overlap in Legacy Mixed-Width EncodingsFigure 2-10. Boundaries and InterpretationConformanceExamplesFigure 2-11. Unicode Encoding FormsUTF-32Fixed WidthPreferred UsageUTF-16Optimized for BMPSupplementary Characters and SurrogatesPreferred UsageOriginCollationUTF-8Byte-OrientedVariable WidthASCII TransparencyPreferred UsageSelf-synchronizingComparison of the Advantages of UTF-32, UTF-16, and UTF-8UTF-32 Versus UTF-16Characters Versus Code PointsUTF-8Binary Sorting2.6 Encoding SchemesByte OrderTable 2-4. The Seven Unicode Encoding SchemesEncoding Scheme Versus Encoding FormExamplesFigure 2-12. Unicode Encoding Schemes2.7 Unicode Strings2.8 Unicode AllocationPlanesBasic Multilingual PlaneSupplementary Multilingual PlaneSupplementary Ideographic PlaneSupplementary Special-purpose PlanePrivate Use PlanesAllocation Areas and Character BlocksAllocation AreasBlocksAllocation OrderAssignment of Code Points2.9 Details of AllocationFigure 2-13. Unicode AllocationPlane 0 (BMP)Figure 2-14. Allocation on the BMPASCII and Latin-1 Compatibility AreaGeneral Scripts AreaPunctuation and Symbols AreaSupplementary General Scripts AreaCJK Miscellaneous AreaCJKV Ideographs AreaGeneral Scripts Area (Asia and Africa)Hangul AreaSurrogates AreaPrivate Use AreaCompatibility and Specials AreaPlane 1 (SMP)Figure 2-15. Allocation on Plane 1General Scripts AreasGeneral Scripts Areas (RTL)Cuneiform and Hieroglyphic AreaIdeographic Scripts AreaSymbols AreasPlane 2 (SIP)Other Planes2.10 Writing DirectionFigure 2-16. Writing DirectionsBidirectionalVerticalBoustrophedonOther Historical Directionalities2.11 Combining CharactersCombining CharactersDiacriticsSymbol DiacriticsEnclosing Combining MarksFigure 2-17. Combining Enclosing Marks for SymbolsScript-Specific Combining CharactersSequence of Base Characters and DiacriticsFigure 2-18. Sequence of Base Characters and DiacriticsOrderingIndic Vowel SignsFigure 2-19. Reordered Indic Vowel SignsPropertiesFigure 2-20. Properties and Combining Character SequencesMultiple Combining CharactersFigure 2-21. Stacking SequencesTable 2-5. Interaction of Combining CharactersTable 2-6. Nondefault StackingLigated Multiple Base CharactersFigure 2-22. Ligated Multiple Base CharactersExhibiting Nonspacing Marks in Isolation"Characters" and Grapheme Clusters2.12 Equivalent Sequences and NormalizationFigure 2-23. Equivalent SequencesNormalizationFigure 2-24. Canonical OrderingDecompositionsTypes of DecomposablesExamplesFigure 2-25. Types of DecomposablesNon-decomposition of Overlaid DiacriticsSecurity Issue2.13 Special Characters and NoncharactersSpecial Noncharacter Code PointsByte Order Mark (BOM)Unicode SignatureLayout and Format Control CharactersThe Replacement CharacterControl Codes2.14 Conforming to the Unicode StandardCharacteristics of Conformant ImplementationsUnacceptable BehaviorAcceptable BehaviorSupported Subsets3 Conformance3.1 Versions of the Unicode StandardStabilityVersion NumberingMajor and Minor VersionsUpdate VersionErrata and CorrigendaErrataCorrigendaReferences to the Unicode StandardPrecision in Version CitationReferences to Unicode Character PropertiesReferences to Unicode Algorithms3.2 Conformance RequirementsCode Points Unassigned to Abstract CharactersInterpretationModificationCharacter Encoding FormsCharacter Encoding SchemesBidirectional TextNormalization FormsNormative ReferencesUnicode AlgorithmsDefault Casing AlgorithmsUnicode Standard Annexes3.3 SemanticsDefinitionsCharacter Identity and Semantics3.4 Characters and EncodingTable 3-1. Named Unicode Algorithms3.5 PropertiesTypes of PropertiesProperty ValuesClassification of Properties by Their ValuesProperty StatusTable 3-2. Normative Character PropertiesTable 3-3. Informative Character PropertiesContext DependenceStability of PropertiesSimple and Derived PropertiesProperty AliasesPrivate Use3.6 CombinationCombining Character SequencesGrapheme ClustersApplication of Combining MarksFigure 3-1. Enclosing MarksCombining Marks and Korean Syllables3.7 DecompositionCompatibility DecompositionCanonical Decomposition3.8 Surrogates3.9 Unicode Encoding FormsTable 3-4. Examples of Unicode Encoding FormsUTF-32UTF-16Table 3-5. UTF-16 Bit DistributionUTF-8Table 3-6. UTF-8 Bit DistributionTable 3-7. Well-Formed UTF-8 Byte SequencesEncoding Form ConversionConstraints on Conversion ProcessesBest Practices for Using U+FFFDTable 3-8. Use of U+FFFD in UTF-8 Conversion3.10 Unicode Encoding SchemesTable 3-9. Summary of UTF-16BE, UTF-16LE, and UTF-16Table 3-10. Summary of UTF-32BE, UTF-32LE, and UTF-323.11 Normalization FormsNormalization StabilityCombining ClassesSpecification of Unicode Normalization FormsStartersTable 3-11. Combining Marks and Starter StatusCanonical Ordering AlgorithmTable 3-12. Reorderable PairsCanonical Composition AlgorithmDefinition of Normalization Forms3.12 Conjoining Jamo BehaviorDefinitionsHangul Syllable Boundary DeterminationTable 3-13. Hangul Syllable No-Break RulesStandard Korean SyllablesTransforming into Standard Korean SyllablesExamples.Table 3-14. Korean Syllable Break ExamplesHangul Syllable CompositionExampleHangul Syllable DecompositionExampleHangul Syllable Name Generation3.13 Default Case AlgorithmsTailoringDefinitionsTable 3-15. Context Specification for CasingDefault Case ConversionDefault Case FoldingDefault Case DetectionTable 3-16. Case Detection ExamplesDefault Caseless Matching4 Character PropertiesStatus and AttributesConsistency of Properties4.1 Unicode Character DatabaseUnihan DatabaseStabilityAliasesUCD in XMLOnline Availability4.2 CaseDefinitions of Case and CasingTable 4-1. Relationship of Casing DefinitionsTable 4-2. Case Function Values for StringsCase MappingTable 4-3. Sources for Case Mapping Information4.3 Combining ClassesFigure 4-1. Positions of Common Combining MarksReordrant, Split, and Subjoined Combining MarksReordrant Class Zero Combining MarksTable 4-4. Class Zero Combining Marks—ReordrantTable 4-5. Thai, Lao, and Tai Viet Logical Order ExceptionsSplit Class Zero Combining MarksTable 4-6. Class Zero Combining Marks—SplitSubjoined Class Zero Combining MarksTable 4-7. Class Zero Combining Marks—SubjoinedStrikethrough Class Zero Combining MarksTable 4-8. Class Zero Combining Marks—Strikethrough4.4 Directionality4.5 General CategoryTable 4-9. General Category4.6 Numeric ValueDecimal DigitsScript-Specific DigitsIdeographic Numeric ValuesTable 4-10. Primary Numeric IdeographsTable 4-11. Ideographs Used as Accounting Numbers4.7 Bidi Mirrored4.8 NameStabilityCharacter Name SyntaxNames as IdentifiersCharacter Name MatchingNamed Character SequencesCharacter Name AliasesUnicode Name PropertyFormal Definition of the Name PropertyName UniquenessInterpretation of Field 1 of UnicodeData.txtControl CodesCode Point LabelsTable 4-12. Construction of Code Point LabelsUse of Character Names in APIs and User InterfacesUse in APIsUser Interfaces4.9 Unicode 1.0 Names4.10 Letters, Alphabetic, and IdeographicLetters and SyllablesAlphabeticIdeographic4.11 Properties Related to Text Boundaries4.12 Characters with Unusual PropertiesTable 4-13. Unusual Properties5 Implementation Guidelines5.1 Data Structures for Character ConversionIssuesMultistage TablesFlat Tables.RangesTwo-Stage TablesFigure 5-1. Two-Stage TablesOptimized Two-Stage TableMultistage Table Tuning5.2 Programming Languages and Data TypesUnicode Data Types for CANSI/ISO C wchar_t5.3 Unknown and Missing CharactersReserved and Private-Use Character CodesInterpretable but Unrenderable CharactersDefault Property ValuesDefault Ignorable Code PointsInteracting with Downlevel Systems5.4 Handling Surrogate Pairs in UTF-16Strategies for Surrogate Pair Support5.5 Handling NumbersFigure 5-2. CJK Ideographic Numbers5.6 NormalizationAlternative SpellingsNormalizationFigure 5-3. Normalization5.7 Compression5.8 Newline GuidelinesDefinitionsTable 5-1. Hex Values for AcronymsEncodingNotationEBCDICNewline FunctionTable 5-2. NLF Platform CorrelationsLine Separator and Paragraph SeparatorRecommendationsConverting from Other Character Code SetsInterpreting Characters in TextConverting to Other Character Code SetsInput and OutputPage Separator5.9 Regular Expressions5.10 Language Information in Plain TextRequirements for Language TaggingLanguage Tags and Han UnificationTypical Scenarios5.11 Editing and SelectionConsistent Text ElementsCluster BoundariesFigure 5-4. Consistent Character BoundariesStacked BoundariesAtomic Character Boundaries.Linear BoundariesNonlinear Boundaries5.12 Strategies for Handling Nonspacing MarksRenderingOther ProcessesKeyboard InputFigure 5-5. Dead Keys Versus Handwriting SequenceTruncationFigure 5-6. Truncating Grapheme Clusters5.13 Rendering Nonspacing MarksFigure 5-7. Inside-Out RuleFallback RenderingFigure 5-8. Fallback RenderingBidirectional PositioningFigure 5-9. Bidirectional PlacementJustificationFigure 5-10. JustificationCanonical EquivalenceTable 5-3. Typing Order Differing from Canonical OrderTable 5-4. Permuting Combining Class WeightsPositioning MethodsPositioning with LigaturesFigure 5-11. Positioning with LigaturesPositioning with Contextual FormsFigure 5-12. Positioning with Contextual FormsPositioning with Enhanced KerningFigure 5-13. Positioning with Enhanced Kerning5.14 Locating Text Element Boundaries5.15 Identifiers5.16 Sorting and SearchingCulturally Expected Sorting and SearchingLanguage-Insensitive SortingSearchingSublinear SearchingFigure 5-14. Sublinear Searching5.17 Binary OrderUTF-8 in UTF-16 OrderUTF-16 in UTF-8 Order5.18 Case MappingsTitlecasingComplications for Case MappingChange in LengthGreek iota subscriptContext-dependent Case MappingsLocale-dependent Case MappingsFigure 5-15. Uppercase Mapping for Turkish IFigure 5-16. Lowercase Mapping for Turkish ICaseless CharactersGerman sharp sFigure 5-17. Casing of German Sharp SReversibilityCaseless MatchingStabilityNormalization and CasingTable 5-5. Casing and Normalization in Strings5.19 Mapping Compatibility VariantsConfusables5.20 Unicode SecurityAlternate EncodingsSpoofing5.21 Default Ignorable Code PointsStateful Format ControlsTable 5-6. Paired Stateful ControlsTable 5-7. Paired Stateful Controls (Deprecated)5.22 Best Practice for U+FFFD Substitution6 Writing Systems and PunctuationScripts and BlocksScripts and Writing SystemsPunctuation6.1 Writing SystemsAlphabetsAbjadsSyllabariesAbugidasFigure 6-1. Overriding Inherent VowelsLogosyllabariesTypology of Scripts in the Unicode StandardTable 6-1. Typology of Scripts in the Unicode StandardNotational Systems6.2 General PunctuationUse and InterpretationRenderingWriting DirectionFigure 6-2. Forms of CJK PunctuationLayout ControlsEncoding Characters with Multiple Semantic ValuesBlocks Devoted to PunctuationFormat Control CharactersSpace CharactersTable 6-2. Unicode Space CharactersDashes and HyphensTable 6-3. Unicode Dash CharactersSoft HyphenTilde.Dictionary Abbreviation SymbolsPaired PunctuationMirroring of Paired Punctuation.Quotation Marks and BracketsLanguage-Based Usage of Quotation MarksEuropean UsageFigure 6-3. European Quotation MarksEast Asian UsageTable 6-4. East Asian Quotation MarksGlyph Variation.Figure 6-4. Asian Quotation MarksTable 6-5. Opening and Closing FormsOverloaded Character CodesConsequences for SemanticsApostrophesLetter ApostrophePunctuation ApostropheOther PunctuationHyphenation PointWord Separator Middle DotFraction SlashSpacing Overscores and UnderscoresDoubled PunctuationPeriod or Full StopEllipsisVertical EllipsisLeader DotsOther Basic Latin Punctuation MarksCanonical Equivalence Issues for Greek PunctuationBulletsParagraph MarksNumeric Separators.Commercial MinusAt SignTable 6-6. Names for the @Archaic Punctuation and Editorial MarksArchaic PunctuationEditorial MarksNew Testament Editorial MarksAncient Greek Editorial MarksFigure 6-5. Examples of Ancient Greek Editorial MarksFigure 6-6. Use of Greek ParagraphosDouble Oblique HyphenIndic PunctuationDandasTable 6-7. Unicode Danda CharactersCJK PunctuationFigure 6-7. CJK ParenthesesSesame DotsUnknown or Unavailable IdeographsCJK Compatibility FormsVertical FormsStyled Overscores and UnderscoresSmall Form VariantsFullwidth and Halfwidth Variants7 European Alphabetic Scripts7.1 LatinLanguagesDiacritical Marks.Alternative Glyphs.Figure 7-1. Alternative Glyphs in LatinVariations in Diacritical MarksLatvian CedillaCedilla and Comma Below in Turkish and RomanianExceptional Case PairsDiacritics on i and jFigure 7-2. Diacritics on i and jVietnameseFigure 7-3. Vietnamese Letters and Tone MarksStandards.Related CharactersLetters of Basic Latin: U+0041–U+007ALetters of the Latin-1 Supplement: U+00C0–U+00FFLanguagesOrdinalsLatin Extended-A: U+0100–U+017FCompatibility DigraphsLanguagesLatin Extended-B: U+0180–U+024FArrangementCroatian Digraphs Matching Serbian Cyrillic LettersPinyin Diacritic–Vowel CombinationsCase PairsCaseless LettersGlottal StopIPA Extensions: U+0250–U+02AFStandardsUnificationsIPA AlternatesCase PairsTypographic VariantsAffricate Digraph LigaturesArrangementPhonetic Extensions: U+1D00–U+1DBFTypographic Features of the UPA.Other Phonetic ExtensionsDigraph for thLatin Extended Additional: U+1E00–U+1EFFCapital Sharp SVietnamese Vowel Plus Tone Mark CombinationsLatin Extended-C: U+2C60–U+2C7FUighurClaudian LettersLatin Extended-D: U+A720–U+A7FFEgyptological TransliterationHistoric Mayan LettersEuropean Medievalist LettersInsular and Celticist LettersOrthographic Letter AdditionsLatvian LettersAncient Roman Epigraphic LettersLatin Ligatures: U+FB00–U+FB067.2 GreekGreek: U+0370–U+03FFStandardsPolytonic GreekNonspacing MarksTable 7-1. Nonspacing Marks Used with GreekIotaVariant LetterformsFigure 7-4. Variations in Greek Capital Letter UpsilonRepresentative Glyphs for Greek PhiGreek Letters as SymbolsSymbols Versus NumbersCompatibility PunctuationHistoric LettersCoptic-Unique LettersRelated CharactersGreek Extended: U+1F00–U+1FFFSpacing DiacriticsTable 7-2. Greek Spacing and Nonspacing PairsAncient Greek Numbers: U+10140–U+1018FAcrophonic NumeralsOther Numerical SymbolsSymbol for Zero7.3 CopticDevelopment of the Coptic ScriptCasingFont StylesCharacters for Cryptogrammic UseCrossed SheiSupralineationCombining Diacritical MarksPunctuationNumerical Use of LettersFigure 7-5. Coptic Numerals7.4 CyrillicHistoric LetterformsGlagoliticCyrillic: U+0400–U+04FFStandardsExtended CyrillicAbkhasianPalochkaCyrillic Supplement: U+0500–U+052FKomiKurdish LettersCyrillic Extended-A: U+2DE0–U+2DFFTitlo LettersCyrillic Extended-B: U+A640–U+A69FNumeric Enclosing SignsOld Abkhasian Letters7.5 GlagoliticGlyph FormsOrderingPunctuation and DiacriticsNumerical Use of Letters7.6 ArmenianOrthographyUser CommunityPunctuationPreferred CharactersLigatures7.7 GeorgianScript FormsCase FormsMtavruli StyleFigure 7-6. Georgian Scripts and CasingPunctuationHistoric Punctuation7.8 Modifier LettersCase and Modifier LettersGeneral CategoryBlocksNamesSpacing Modifier Letters: U+02B0–U+02FFPhonetic UsageEncoding PrinciplesSuperscript LettersSpacing Clones of DiacriticsRhotic HookTone LettersFigure 7-7. Tone LettersModifier Tone Letters: U+A700–U+A71F7.9 Combining MarksSequence of Base Letters and Combining MarksMultiple SemanticsGlyphic VariationOverlaid DiacriticsMarks as Spacing CharactersSpacing Clones of Diacritical MarksRelationship to ISO/IEC 8859-1Diacritics Positioned Over Two Base CharactersFigure 7-8. Double DiacriticsFigure 7-9. Positioning of Double DiacriticsFigure 7-10. Use of CGJ with Double DiacriticsCombining Marks with LigaturesFigure 7-11. Interaction of Combining Marks with LigaturesCombining Diacritical Marks: U+0300–U+036FStandardsUnderlining and OverliningCombining Diacritical Marks Supplement: U+1DC0–U+1DFFCombining Marks for Symbols: U+20D0–U+20FFFigure 7-12. Use of Vertical Line Overlay for NegationEnclosing MarksCombining Half Marks: U+FE20–U+FE2FFigure 7-13. Double Diacritics and Half MarksCombining Marks in Other Blocks8 Middle Eastern Scripts8.1 HebrewHebrew: U+0590–U+05FFDirectionalityCursive.StandardsVowels and Other Marks of PronunciationShin and SinFinal (Contextual Variant) LetterformsYiddish DigraphsPunctuationCantillation MarksPositioningMetegAtnah Hafukh and Qamats QatanHolam Male and Holam HaserPuncta ExtraordinariaNun HafukhaCurrency SymbolAlphabetic Presentation Forms: U+FB1D–U+FB4FUse of Wide Letters8.2 ArabicArabic: U+0600–U+06FFFigure 8-1. Directionality and Cursive ConnectionDirectionalityStandardsEncoding PrinciplesPunctuationThe Non-joiner and the JoinerFigure 8-2. Using a JoinerFigure 8-3. Using a Non-joinerFigure 8-4. Combinations of Joiners and Non-joinersHarakat (Vowel) Nonspacing MarksFigure 8-5. Placement of HarakatArabic-Indic DigitsTable 8-1. Arabic Digit NamesTable 8-2. Glyph Variation in Eastern Arabic-Indic DigitsExtended Arabic LettersKoranic Annotation SignsAdditional Vowel MarksHonorificsArabic Mathematical SymbolsDate SeparatorFull StopCurrency SymbolsEnd of AyahOther Signs Spanning NumbersFigure 8-6. Arabic Year SignPoetic Verse SignArabic Cursive JoiningMinimum Rendering RequirementsJoining TypesTable 8-3. Primary Arabic Joining TypesTable 8-4. Derived Arabic Joining TypesJoining RulesTable 8-5. Arabic Glyph TypesArabic LigaturesLigature ClassesTable 8-6. Arabic Obligatory Ligature Joining GroupsLigature RulesTable 8-7. Arabic Ligature NotationOptional FeaturesArabic Joining GroupsDual-JoiningTable 8-8. Dual-Joining Arabic CharactersRight-JoiningTable 8-9. Right-Joining Arabic CharactersLetter hehLetter yehTable 8-10. Forms of the Arabic Letter yehCombining Hamza AboveJawiArabic Supplement: U+0750–U+077FMarwariArabic Presentation Forms-A: U+FB50–U+FDFFOrnate ParenthesesNuktasArabic Presentation Forms-B: U+FE70–U+FEFFSpacing and Tatweel Forms of Arabic DiacriticsZero Width No-Break Space8.3 SyriacSyriac: U+0700–U+074FSyriac LanguageLanguages Using the Syriac Script.ShapingDirectionalitySyriac Type StylesCharacter NamesSyriac Abbreviation MarkFigure 8-7. Syriac AbbreviationFigure 8-8. Use of SAMLigatures and Combining CharactersDiacritic Marks and VowelsPunctuationDigitsHarklean MarksDalath and RishSemkathVowel MarksMiscellaneous Diacritics.Table 8-11. Miscellaneous Syriac Diacritic UseUse of Characters of the Arabic BlockSyriac ShapingMinimum Rendering RequirementsJoining TypesTable 8-12. Syriac Final Alaph Glyph TypesSyriac Character Joining GroupsTable 8-13. Dual-Joining Syriac CharactersTable 8-14. Right-Joining Syriac CharactersTable 8-15. Syriac Alaph Glyph FormsLigature ClassesTable 8-16. Syriac Ligatures8.4 SamaritanDirectionalityVowel SignsConsonant ModifiersPunctuationTable 8-17. Samaritan Performative Punctuation Marks8.5 ThaanaDirectionalityVowelsTable 8-18. Thaana Glyph PlacementNumeralsPunctuationCharacter Names and Arrangement9 South Asian Scripts-I9.1 DevanagariDevanagari: U+0900–U+097FStandardsEncoding PrinciplesPrinciples of the Devanagari ScriptRendering Devanagari CharactersConsonant LettersIndependent Vowel LettersDependent Vowel Signs (Matras)Vowel LettersTable 9-1. Devanagari Vowel LettersVirama (Halant)Figure 9-1. Dead Consonants in DevanagariConsonant ConjunctsFigure 9-2. Conjunct Formations in DevanagariExplicit Virama (Halant)Figure 9-3. Preventing Conjunct Forms in DevanagariExplicit Half-ConsonantsFigure 9-4. Half-Consonants in DevanagariFigure 9-5. Independent Half-Forms in DevanagariFigure 9-6. Half-Consonants in OriyaConsonant FormsFigure 9-7. Consonant Forms in Devanagari and OriyaRendering DevanagariRules for RenderingNotationDead Consonant RuleConsonant RA RulesModifier Mark RulesLigature RulesMemory Representation and Rendering OrderFigure 9-8. Rendering Order in DevanagariSample Half-FormsTable 9-2. Sample Devanagari Half-FormsSample LigaturesTable 9-3. Sample Devanagari LigaturesSample Half-Ligature FormsTable 9-4. Sample Devanagari Half-Ligature FormsLanguage-Specific AllographsFigure 9-9. Marathi AllographsCombining MarksDevanagari Digits, Punctuation, and SymbolsDigitsPunctuationOther SymbolsExtensions in the Main Devanagari BlockSindhi LettersKonkaniBodo, Dogri, and MaithiliFigure 9-10. Use of Apostrophe in Bodo, Dogri and MaithiliFigure 9-11. Use of Avagraha in DogriKashmiri LettersPrishthamatra OrthographyTable 9-5. Prishthamatra OrthographyDevanagari Extended: U+A8E0-U+A8FFCantillation Marks for the SZmavedaMarks of NasalizationEditorial MarksVedic Extensions: U+1CD0-U+1CFFTone MarksDiacritics for the Visarga.Nasalization MarksArdhavisarga9.2 Bengali (Bangla)Virama (Hasant)Vowel LettersTable 9-6. Bengali Vowel LettersTwo-Part Vowel SignsSpecial CharactersHistoric CharactersRendering BehaviorConsonant-Vowel LigaturesTable 9-7. Bengali Consonant-Vowel CombinationsFigure 9-12. Requesting Bengali Consonant-Vowel LigatureFigure 9-13. Blocking Bengali Consonant-Vowel LigatureKhiyaKhanda Ta.Figure 9-14. Bengali Syllable ttaYa-phalaaInteraction of Repha and Ya-phalaaPunctuationTruncationTable 9-8. Use of Apostrophe in Bangla9.3 GurmukhiEncoding PrinciplesVowel LettersTable 9-9. Gurmukhi Vowel LettersTonesOrderingRendering BehaviorTable 9-10. Gurmukhi ConjunctsTable 9-11. Additional Pairin and Addha Forms in GurmukhiTable 9-12. Use of Joiners in GurmukhiOther SymbolsPunctuation9.4 GujaratiVowel LettersTable 9-13. Gujarati Vowel LettersRendering BehaviorTable 9-14. Gujarati ConjunctsPunctuation9.5 OriyaSpecial CharactersVowel LettersTable 9-15. Oriya Vowel LettersRendering BehaviorTable 9-16. Oriya ConjunctsConsonant FormsVowelsTable 9-17. Oriya Vowel PlacementOriya VA and WA.Punctuation and SymbolsFraction Characters9.6 TamilTamil: U+0B80–U+0BFFVirama (Pulli)Figure 9-15. Kssa Ligature in TamilRendering of the Tamil ScriptTamil VowelsIndependent Versus Dependent VowelsLeft-Side VowelsTable 9-18. Tamil Vowel ReorderingTwo-Part VowelsFigure 9-16. Tamil Two-Part VowelsTable 9-19. Tamil Vowel Splitting and ReorderingFigure 9-17. Vowel Reordering Around a Tamil ConjunctTamil LigaturesLigatures with Vowel iFigure 9-18. Tamil Ligatures with iLigatures with Vowel uTable 9-20. Tamil Ligatures with uFigure 9-19. Spacing Forms of Tamil uLigatures with raFigure 9-20. Tamil Ligatures with raLigatures with aa in Traditional Tamil OrthographyFigure 9-21. Tamil Ligatures with aaFigure 9-22. Tamil Ligatures with oLigatures with ai in Traditional Tamil OrthographyFigure 9-23. Tamil Ligatures with aiFigure 9-24. Vowel ai in Modern TamilTamil aythamPunctuationTamil Named Character SequencesTable 9-21. Tamil Vowels, Consonants, and Syllables9.7 TeluguVowel LettersTable 9-22. Telugu Vowel LettersRendering BehaviorSpecial CharactersFractionsPunctuation9.8 KannadaKannada: U+0C80–U+0CFFPrinciples of the Kannada ScriptVowel LettersTable 9-23. Kannada Vowel LettersConsonant ConjunctsSpecial CharactersKannada Letter LLLARendering KannadaExplicit Virama (Halant)Consonant Clusters Involving RAModifier Mark RulesAvagraha SignPunctuation9.9 MalayalamVowel LettersTable 9-24. Malayalam Vowel LettersRendering BehaviorTable 9-25. Malayalam Orthographic ReformTable 9-26. Malayalam ConjunctsTable 9-27. Candrakala ExamplesChillu CharactersTable 9-28. Atomic Encoding of Malayalam ChillusSpecial Cases Involving raTable 9-29. Malayalam /rr/ and /tt/Table 9-30. Malayalam /nr/ and /nt/Dot RephHistoric CharactersSpecial CharactersPunctuation10 South Asian Scripts-II10.1 SinhalaVowel LettersTable 10-1. Sinhala Vowel LettersOther Letters for Tamil.Historical Symbols.10.2 TibetanGeneral Principles of the Tibetan ScriptFigure 10-1. Tibetan Syllable StructureConsonantsVowelsCoding OrderAllographical ConsiderationsHead Position "ra"Full-Form "ra" in Head Position.Subjoined Position "wa", "ya", and "ra"Halanta (Srog-Med)Line Breaking ConsiderationsTibetan PunctuationSvasti SignsOther CharactersTibetan Half-NumbersTibetan Transliteration and Transcription of Other LanguagesOther SignsTraditional Text Formatting and Line JustificationFigure 10-2. Justifying Tibetan TseksTibetan Shorthand Abbreviations (bskungs-yig) and Limitations of the Encoding10.3 LepchaStructureVowelsMedialsRetroflex ConsonantsOrdering of Syllable ComponentsTable 10-2. Lepcha Syllabic StructureRenderingDigitsPunctuation10.4 Phags-paHistoryBasic StructureSyllable DivisionCandrabinduFigure 10-3. Phags-pa Syllable OmAlternate LettersNumbersPunctuationPositional VariantsTable 10-3. Phags-pa Positional Forms of I, U, E, and OMirrored VariantsTable 10-4. Contextual Glyph Mirroring in Phags-paTable 10-5. Phags-pa Standardized VariantsFigure 10-4. Phags-pa Reversed Shaping10.5 LimbuConsonantsVowelsVowel LengthGlottalizationCollating OrderGlyph PlacementTable 10-6. Positions of Limbu Combining CharactersPunctuationDigits10.6 Syloti NagriVirama and ConjunctsDigitsPunctuationPoetry Marks10.7 KaithiStandardsStylesRendering BehaviorVowel LettersConsonant ConjunctsRuled LinesNuktaPunctuationDigits10.8 SaurashtraGlyph PlacementDigitsPunctuationSaurashtra Consonant Sign Haaru10.9 Meetei MayekStructureVowel LettersFinal ConsonantsAbbreviationsOrderPunctuationDigits10.10 Ol ChikiStructureDigitsPunctuationModifier LettersGlottalizationAspirationLigatures10.11 KharoshthiKharoshthi: U+10A00–U+10A5FFigure 10-5. Geographical Extent of the Kharoshthi ScriptDirectionalityDiacritic Marks and VowelsNumeralsFigure 10-6. Kharoshthi Number 1996PunctuationWord Breaks, Line Breaks, and HyphenationSortingRendering KharoshthiFigure 10-7. Kharoshthi Rendering ExampleCombining VowelsTable 10-7. Kharoshthi Vowel SignsCombining Vowel ModifiersTable 10-8. Kharoshthi Vowel ModifiersCombining Consonant ModifiersTable 10-9. Kharoshthi Consonant ModifiersViramaTable 10-10. Examples of Kharoshthi Virama10.12 BrahmiBrahmi: U+11000–U+1106FEncoding ModelVowel LettersTable 10-11. Brahmi Vowel LettersRendering BehaviorFigure 10-8. Consonant Ligatures in BrahmiVowel ModifiersOld Tamil BrahmiBhattiprolu BrahmiPunctuationNumeralsTable 10-12. Brahmi Positional Digits11 Southeast Asian Scripts11.1 ThaiStandards.Encoding Principles.Table 11-1. Glyph Positions in Thai SyllablesRendering of Thai Combining MarksThai PunctuationSpacingThai Transcription of Pali and Sanskrit11.2 LaoEncoding PrinciplesPunctuationGlyph PlacementTable 11-2. Glyph Positions in Lao SyllablesAdditional LettersRendering of Lao Combining MarksLao Aspirated Nasals11.3 MyanmarMyanmar: U+1000–U+109FStandardsEncoding PrinciplesComposite CharactersEncoding SubrangesConjunctsKinziMedial ConsonantsAsatContractionsGreat saTall aaOrdering of Syllable ComponentsTable 11-3. Myanmar Syllabic StructureSpacing.Myanmar Extended-A: U+AA60–U+AA7FKhamti ShanConsonantsVowelsTonesTable 11-4. Khamti Shan Tone MarksDigitsOther SymbolsSubjoined CharactersHistorical Khamti ShanAiton and PhakeConsonantsSubjoined ConsonantsVowelsLigaturesTones11.4 KhmerKhmer: U+1780–U+17FFPrinciples of the Khmer ScriptGlottal ConsonantTable 11-5. Independent Khmer Vowel CharactersSubscript ConsonantsSubscript Independent Vowel SignsConsonant RegistersTable 11-6. Two Registers of Khmer ConsonantsEncoding PrinciplesSubscript Consonant SignsTable 11-7. Khmer Subscript Consonant SignsDependent Vowel SignsTable 11-8. Khmer Composite Dependent Vowel Signs with NikahitIndependent Vowel CharactersSubscript Independent Vowel SignsTable 11-9. Khmer Subscript Independent Vowel SignsOther Signs as Syllabic ComponentsLigaturesFigure 11-1. Common Ligatures in KhmerMultiple GlyphsFigure 11-2. Common Multiple Forms in KhmerCharacters Whose Use Is DiscouragedOrdering of Syllable Components.Figure 11-3. Examples of Syllabic Order in KhmerConsonant ShiftersLigature ControlFigure 11-4. Ligation in Muul Style in KhmerSpacing.Khmer Symbols: U+19E0–U+19FFSymbols11.5 Tai LeTable 11-10. Tai Le Tone MarksDigits.Table 11-11. Myanmar DigitsPunctuation.11.6 New Tai LueSyllabic StructureTable 11-12. New Tai Lue Vowel PlacementFinal ConsonantsTonesTable 11-13. New Tai Lue Registers and TonesDigits11.7 Tai ThamConsonantsIndependent VowelsDependent Consonant SignsDependent Vowel SignsTone MarksOther Combining MarksDigitsPunctuationCollating OrderLinebreaking11.8 Tai VietStructureVisual OrderTone Classes and Tone MarksFinal ConsonantsSymbols and PunctuationTable 11-14. Tai Viet Symbols and PunctuationWord SpacingCollating Order11.9 Kayah LiStructureVowelsTonesDigitsPunctuation11.10 ChamStructureIndependent Vowel LettersConsonantsOrdering of Syllable ComponentsTable 11-15. Cham Syllabic StructureDigitsPunctuationLine Breaking11.11 Philippine ScriptsTagalog: U+1700–U+171FHanun?o: U+1720–U+173FBuhid: U+1740–U+175FTagbanwa: U+1760–U+177FPrinciples of the Philippine ScriptsConsonant Letters.Independent Vowel Letters.Dependent Vowel Signs.Virama.Directionality.Rendering.Table 11-16. Hanunoo and Buhid Vowel Sign CombinationsPunctuation.11.12 BugineseStructureLigatureFigure 11-5. Buginese LigatureOrderPunctuationNumerals11.13 BalineseStructureTable 11-17. Balinese Base Consonants and Conjunct FormsTable 11-18. Sasak Extensions for BalineseBehavior of raFigure 11-6. Writing dharma in BalineseBehavior of ra repaRenderingTable 11-19. Balinese Consonant Clusters with u and u:NuktaOrderingPunctuationHyphenationMusical SymbolsModre Symbols11.14 JavaneseConsonantsIndependent VowelsDependent VowelsFigure 11-7. Representation of Javanese Two-Part VowelsConsonant SignsRenderingDigitsPunctuationReduplicationOrdering of Syllable ComponentsLinebreaking11.15 RejangStructureRenderingOrderingDigitsPunctuation11.16 BatakStructureRenderingPunctuationLinebreaking11.17 SundaneseStructureConsonant AdditionsDigitsPunctuationOrderingOrdering of Syllable ComponentsTable 11-20. Sundanese Syllabic StructureRendering12 East Asian Scripts12.1 HanCJK Unified IdeographsCJK StandardsTable 12-1. Sources for Unified HanSource Label Discrepancies in Version 6.0Omission of Repertoire for Some SourcesBlocks Containing Han IdeographsTable 12-2. Blocks Containing Han IdeographsTable 12-3. Small Extensions to the UROIICoreGeneral Characteristics of Han IdeographsTable 12-4. Common Han CharactersTerminologyDistinguishing Han Character Usage Between LanguagesFigure 12-1. Han SpellingFigure 12-2. Semantic Context for Han CharactersSimplified and Traditional ChineseDialects and Early Forms of ChineseSorting Han IdeographsCharacter GlyphsPrinciples of Han UnificationThree-Dimensional Conceptual ModelFigure 12-3. Three-Dimensional Conceptual ModelUnification RulesFigure 12-4. CJK Source SeparationTable 12-5. Source Encoding for Sword VariantsFigure 12-5. Not Cognates, Not UnifiedAbstract ShapeTwo-Level ClassificationIdeographic Component StructureFigure 12-6. Ideographic Component StructureFigure 12-7. The Most Superior Node of an Ideographic ComponentIdeograph FeaturesUniqueness or UnificationSpatial PositioningExamplesTable 12-6. Ideographs Not UnifiedTable 12-7. Ideographs UnifiedHan Ideograph ArrangementTable 12-8. Han Ideograph ArrangementRadical-Stroke IndicesMappings for Han IdeographsCJK Unified Ideographs Extension B: U+20000–U+2A6D6CJK Unified Ideographs Extension C: U+2A700–U+2B734CJK Unified Ideographs Extension D: U+2B740–U+2B81DCJK Compatibility Ideographs: U+F900–U+FAFFCJK Compatibility Supplement: U+2F800–U+2FA1DKanbun: U+3190–U+319FSymbols Derived from Han IdeographsCJK and KangXi Radicals: U+2E80–U+2FD5Standards.Semantics.CJK Additions from HKSCS and GB 18030CJK Strokes: U+31C0–U+31EF12.2 Ideographic Description CharactersApplicability to Other ScriptsIdeographic Description SequencesFigure 12-8. Using the Ideographic Description CharactersEquivalence.Interaction with the Ideographic Variation Mark.Rendering.Character Boundaries.Standards.12.3 BopomofoStandardsMandarin Tone MarksTable 12-9. Mandarin Tone MarksStandard Mandarin BopomofoExtended BopomofoExtended Bopomofo Tone Marks.Table 12-10. Minnan and Hakka Tone MarksRendering of Bopomofo12.4 Hiragana and KatakanaHiragana: U+3040–U+309FStandardsCombining MarksIteration MarksVertical Text DigraphKatakana: U+30A0–U+30FFStandardsPunctuation-like CharactersVertical Text DigraphKatakana Phonetic Extensions: U+31F0–U+31FFStandardsKana Supplement U+1B000–U+1B0FFFigure 12-9. Japanese Historic Kana for e and ye12.5 Halfwidth and Fullwidth FormsUnifications12.6 HangulHangul Jamo: U+1100–U+11FFHangul Jamo Extended-A: U+A960–U+A97FHangul Jamo Extended-B: U+D7B0–U+D7FFHangul Compatibility Jamo: U+3130–U+318FStandardsNormalizationTable 12-11. Separating Jamo CharactersHangul Syllables: U+AC00–U+D7A3StandardsEquivalenceHangul Syllable CompositionHangul Syllable DecompositionHangul Syllable NameHangul Syllable Representative GlyphTable 12-12. Line-Based Placement of JungseongCollation12.7 YiTraditional Yi ScriptStandardized Yi ScriptStandardsNaming Conventions and OrderYi Syllable Iteration MarkPunctuationRenderingYi Radicals13 Additional Modern Scripts13.1 EthiopicEthiopic: U+1200–U+137FBasic and Extended Ethiopic.Encoding Principles.Variant Glyph FormsLabialized SubseriesTable 13-1. Labialized Forms in Ethiopic -WAATable 13-2. Labialized Forms in Ethiopic -WEKeyboard Input.Syllable NamesEncoding Order and SortingWord SeparatorsSection MarkDiacritical MarksNumbersEthiopic Extensions13.2 MongolianHistoryDirectionalityEncoding PrinciplesFigure 13-1. Mongolian Glyph ConvergenceCursive JoiningFigure 13-2. Mongolian Consonant LigationFigure 13-3. Mongolian Positional FormsFree Variation SelectorsFigure 13-4. Mongolian Free Variation SelectorRepresentative GlyphsVowel HarmonyFigure 13-5. Mongolian Gender FormsNarrow No-Break SpaceMongolian Vowel SeparatorFigure 13-6. Mongolian Vowel SeparatorNumbersPunctuationNiruguSyllable Boundary Marker13.3 OsmanyaStructureOrderingNames and Glyphs13.4 TifinaghHistorySource StandardsOrderingDirectionalityDiacritical Marks.Contextual ShapingFigure 13-7. Tifinagh Contextual ShapingBi-ConsonantsFigure 13-8. Tifinagh Consonant Joiner and Bi-consonants13.5 N'KoStructureDigitsDiacritical MarksTable 13-3. N'Ko Tone Diacritics on VowelsTable 13-4. Other N'Ko Diacritic UsageOrdinal NumbersFigure 13-9. Examples of N'Ko OrdinalsPunctuationCharacter Names and Block NameOrderingRenderingTable 13-5. N'Ko Letter Shaping13.6 VaiSourcesBasic StructureHistoric SyllablesLogogramsDigitsPunctuationSegmentationOrdering13.7 BamumBamum: U+A6A0–U+A6FFStructureDiacritical MarksPunctuationDigitsBamum Supplement: U+16800–U+16A3F13.8 CherokeeTones.Case and Spelling.Numbers.Rendering and InputPunctuation.Standards.13.9 Canadian Aboriginal SyllabicsCanadian Aboriginal Syllabics: U+1400–U+167FOrganizationArrangementExtensionsPunctuation and SymbolsCanadian Aboriginal Syllabics Extended: U+18B0–U+18FF13.10 DeseretLetter Names and Shapes.Structure.Sorting.Typographic Conventions.Figure 13-10. Short Words Equivalent to Deseret Letter NamesPhonetics.Table 13-6. IPA Transcription of Deseret13.11 ShavianStructure.Collation13.12 LisuStructureTone LettersTable 13-7. Lisu Tone LettersOther Modifier LettersDigits and SeparatorsPunctuationTable 13-8. Punctuation Adopted in Lisu OrthographyLinebreakingWord Separation14 Ancient and Historic Scripts14.1 OghamStructure.Rendering.Forfeda (Supplementary Characters)14.2 Old ItalicDirectionalityPunctuationNumeralsGlyphsFigure 14-1. Distribution of Old Italic14.3 RunicHistorical ScriptDirectionThe Runic AlphabetRepresentative GlyphsUnificationsLong-Branch and Short-TwigStaveless RunesPunctuation MarksGolden NumbersEncoding14.4 GothicDiacriticsNumeralsPunctuation14.5 Old TurkicStructureDirectionalityPunctuation14.6 Linear BLinear B Syllabary: U+10000–U+1007FStandardsLinear B Ideograms: U+10080–U+100FFAegean Numbers: U+10100–U+1013F14.7 Cypriot SyllabaryTable 14-1. Similar Characters in Linear B and Cypriot14.8 Ancient Anatolian AlphabetsLycian: U+10280–U+1029FCarian: U+102A0–U+102DFLydian: U+10920–U+1093FLycianCarianLydian14.9 Old South ArabianDirectionalityStructureSegmentationMonogramsNumbersTable 14-2. Old South Arabian Numeric CharactersTable 14-3. Number Formation in Old South ArabianNames14.10 PhoenicianDirectionalityPunctuationStylistic VariationNumeralsNames14.11 Imperial AramaicDirectionalityPunctuationNumbersTable 14-4. Number Formation in Aramaic14.12 MandaicStructurePunctuationDirectionalityShaping and Layout BehaviorTable 14-5. Dual-Joining Mandaic CharactersTable 14-6. Right-Joining Mandaic CharactersLinebreaking14.13 Inscriptional Parthian and Inscriptional PahlaviDirectionalityShaping and Layout BehaviorTable 14-7. Inscriptional Parthian Shaping BehaviorNumbersHeterograms14.14 AvestanDirectionalityShaping BehaviorTable 14-8. Avestan Shaping BehaviorPunctuation14.15 UgariticVariant GlyphsOrdering.Character Names14.16 Old PersianDirectionalityRepertoireNumeralsVariants14.17 Sumero-AkkadianCuneiform: U+12000–U+123FFEarly History of CuneiformGeographic RangeTable 14-9. Cuneiform Script UsageSources and CoverageSimple SignsComplex and Compound SignsMergers and SplitsGlyph Variants Acquiring Independent Semantic StatusFormattingOrderingOther StandardsCuneiform Numbers and Punctuation: U+12400–U+1247FCuneiform PunctuationCuneiform Numerals14.18 Egyptian HieroglyphsStructureDirectionalityRenderingTable 14-10. Hieroglyphic Character SequenceFigure 14-2. Interpretion of Hieroglyphic MarkupHieratic FontsRepertoireCharacter NamesSign ClassificationEnclosuresNumerals15 Symbols15.1 Currency SymbolsUnificationFigure 15-1. Alternative Glyphs for Dollar SignFonts.Table 15-1. Currency Symbols Encoded in Other BlocksLira SignYen and YuanEuro SignIndian Rupee Sign15.2 Letterlike SymbolsLetterlike Symbols: U+2100–U+214FNumero SignFigure 15-2. Alternative Glyphs for Numero SignUnit SymbolsCompatibilityStylesStandardsMathematical Alphanumeric Symbols: U+1D400–U+1D7FFWords Used as Variables.Mathematical AlphabetsBasic Set of Alphanumeric CharactersAdditional CharactersDotless CharactersFigure 15-3. Wide Mathematical AccentsSemantic Distinctions.Figure 15-4. Style Variants and Semantic Distinctions in MathematicsMathematical AlphabetsTable 15-2. Mathematical Alphanumeric SymbolsCompatibility DecompositionsFonts Used for Mathematical AlphabetsFrakturMath ItalicsFigure 15-5. Easily Confused Shapes for Mathematical GlyphsHard-to-Distinguish LettersFont Support for Combining DiacriticsType Style for Script CharactersDouble-Struck Characters15.3 Number FormsNumber Forms: U+2150–U+218FFractionsFigure 15-6. Alternate Forms of Vulgar FractionsRoman NumeralsCommon Indic Number Forms: U+A830–U+A83FRumi Numeral Forms: U+10E60–U+10E7ECJK Number FormsChinese Counting-Rod NumeralsSuzhou-Style NumeralsSuperscripts and Subscripts: U+2070–U+209FParsing of Superscript and Subscript DigitsStandardsSuperscripts and Subscripts in Other Blocks15.4 Mathematical SymbolsSemantics.Mathematical PropertyMathematical Operators: U+2200–U+22FFStandardsEncoding PrinciplesUnificationsGreek-Derived SymbolsN-ary OperatorsInvisible OperatorsMinus SignDelimitersBidirectional LayoutOther Elements of Mathematical NotationSupplements to Mathematical Symbols and ArrowsStandards.Supplemental Mathematical Operators: U+2A00–U+2AFFMiscellaneous Mathematical Symbols-A: U+27C0–U+27EFMathematical Brackets.Long DivisionMiscellaneous Mathematical Symbols-B: U+2980–U+29FFWiggly Fence.Miscellaneous Symbols and Arrows: U+2B00–U+2B7FArrows: U+2190–U+21FFBidirectional LayoutStandardsUnificationsSupplemental ArrowsLong Arrows.Standardized Variants of Mathematical SymbolsChange in Representative Glyphs for U+2278 and U+227915.5 Invisible Mathematical OperatorsInvisible SeparatorInvisible MultiplicationInvisible PlusInvisible Function Application15.6 Technical SymbolsControl Pictures: U+2400–U+243FCode Points for Pictures for Control CodesPictures for ASCII SpaceStandardsMiscellaneous Technical: U+2300–U+23FFKeytop Labels.Floor and CeilingCrops and Quine CornersFigure 15-7. Usage of Crops and Quine CornersAngle Brackets.APL Functional SymbolsSymbol Pieces.Table 15-3. Use of Mathematical Symbol PiecesHorizontal BracketsTerminal Graphics Characters.Decimal Exponent SymbolFigure 15-8. Usage of the Decimal Exponent SymbolDental Symbols.Metrical SymbolsElectrotechnical SymbolsUser Interface SymbolsStandards.Optical Character Recognition: U+2440–U+245FStandards15.7 Geometrical SymbolsBox Drawing and Block ElementsBox DrawingBlock ElementsStandardsGeometric Shapes: U+25A0–U+25FFHatched SquaresLozengeUse in MathematicsStandards15.8 Miscellaneous SymbolsRendering of Emoji SymbolsColor Words in Unicode Character NamesMiscellaneous Symbols: U+2600–U+26FFMiscellaneous Symbols and Pictographs: U+1F300–U+1F5FFStandardsWeather SymbolsTraffic SignsDictionary and Map SymbolsPlastic Bottle Material Code System.Recycling Symbol for Generic MaterialsUniversal Recycling SymbolPaper Recycling SymbolsGender SymbolsGenealogical SymbolsGame SymbolsAnimal SymbolsCultural SymbolsMiscellaneous Symbols in Other BlocksEmoticons: U+1F600–U+1F64FTransport and Map Symbols: U+1F680–U+1F6FFDingbats: U+2700–U+27BFUnifications and Additions.Ornamental Brackets.Alchemical Symbols: U+1F700–U+1F77FMahjong Tiles: U+1F000–U+1F02FDomino Tiles: U+1F030–U+1F09FPlaying Cards: U+1F0A0–U+1F0FFYijing Hexagram Symbols: U+4DC0–U+4DFFTai Xuan Jing Symbols: U+1D300–U+1D356MonogramsDigramsTetragramsAncient Symbols: U+10190–U+101CFPhaistos Disc Symbols: U+101D0–U+101FF15.9 Enclosed and SquareEnclosed SymbolsSquare SymbolsSource StandardsAllocationDecompositionCasingEnclosed Alphanumerics: U+2460–U+24FFEnclosed CJK Letters and Months: U+3200–U+32FFCJK Compatibility: U+3300–U+33FFJapanese Era NamesTable 15-4. Japanese Era NamesEnclosed Alphanumeric Supplement: U+1F100–U+1F1FFRegional Indicator SymbolsEnclosed Ideographic Supplement: U+1F200–U+1F2FF15.10 BrailleExampleUsage ModelImaging.Script15.11 Western Musical SymbolsGlyphsSymbols in Other BlocksGregorianProcessing.Input Methods.Directionality.Figure 15-9. Examples of Specialized Music LayoutFormat Characters.Precomposed Note CharactersFigure 15-10. Precomposed Note CharactersAlternative Noteheads.Figure 15-11. Alternative NoteheadsAugmentation Dots and Articulation SymbolsFigure 15-12. Augmentation Dots and Articulation SymbolsOrnamentation.Table 15-5. Examples of Ornamentation15.12 Byzantine Musical SymbolsProcessing.15.13 Ancient Greek Musical NotationUnificationTable 15-6. Representation of Ancient Greek Vocal and Instrumental NotationNaming ConventionsFontCombining Marks16 Special Areas and Format Characters16.1 Control CodesRepresenting Control SequencesEscape SequencesSpecification of Control Code SemanticsTable 16-1. Control Codes Specified in the Unicode StandardNewline Function16.2 Layout ControlsLine and Word BreakingNo-Break SpaceWord JoinerZero Width No-Break SpaceZero Width SpaceTable 16-2. Letter SpacingZero-Width Spaces and Joiner CharactersHyphenationLine and Paragraph SeparatorCursive Connection and LigaturesJoinerNon-joinerCursive ConnectionFigure 16-1. Prevention of JoiningFigure 16-2. Exhibition of Joining Glyphs in IsolationExamples.Figure 16-3. Effect of Intervening JoinersTransparencyJoiner and Non-joiner in Indic ScriptsImplementation Notes.Filtering Joiner and Non-joinerCombining Grapheme JoinerBlocking ReorderingCGJ and CollationRenderingCGJ and Joiner CharactersBidirectional Ordering ControlsTable 16-3. Bidirectional Ordering Controls16.3 Deprecated Format CharactersSymmetric SwappingCharacter Shaping SelectorsNumeric Shape Selectors16.4 Variation SelectorsVariation SequenceMongolian16.5 Private-Use CharactersProperties.Normalization.Private Use Area: U+E000–U+F8FFEncoding Structure.Corporate Use SubareaEnd-User Subarea.Allocation of SubareasSupplementary Private Use AreasEncoding Structure.16.6 Surrogates AreaHigh-SurrogateLow-SurrogatePrivate-Use High-Surrogates16.7 NoncharactersU+FFFF and U+10FFFFU+FFFE16.8 SpecialsByte Order Mark (BOM): U+FEFFTable 16-4. Unicode Encoding Scheme SignaturesTable 16-5. U+FEFF Signature in Other CharsetsSpecials: U+FFF0–U+FFF8Annotation Characters: U+FFF9–U+FFFBFigure 16-4. Annotation CharactersConformanceUse in Plain TextLexical RestrictionsFormattingInputCollationBidirectional TextReplacement Characters: U+FFFC–U+FFFDU+FFFCU+FFFD16.9 Deprecated Tag CharactersDeprecated Tag Characters: U+E0000–U+E007FSyntax for Embedding TagsTag IdentificationTag TerminationLanguage Tags.Tag Scope and NestingFigure 16-5. Tag CharactersCanceling Tag ValuesWorking with Language TagsAvoiding Language TagsHigher-Level Protocols.Effect of Tags on Interpretation of TextDisplayProcessingRange Checking for Tag CharactersEditing and ModificationDangers of Incomplete SupportUnicode Conformance IssuesFormal Tag Syntax17 About the Code Charts17.1 Character Names ListImages in the Code Charts and Character ListsFontsAlternative FormsOrientationSpecial Characters and Code PointsCombining CharactersDashed Box ConventionReserved CharactersNoncharactersDeprecated CharactersCharacter NamesInformative AliasesNormative AliasesCross ReferencesExplicit InequalityOther Linguistic RelationshipsInformation About LanguagesCase MappingsDecompositionsSubheads17.2 CJK Unified and Compatibility IdeographsCJK Unified IdeographsTable 17-1. IRG SourcesChart for the Main CJK BlockFigure 17-1. CJK Chart Format for the Main CJK BlockCharts for CJK ExtensionsFigure 17-2. CJK Chart Format for CJK Extension AFigure 17-3. CJK Chart Format for CJK Extension BCompatibility IdeographsFigure 17-4. CJK Chart Format for Compatibility Ideographs17.3 Hangul SyllablesA Notational ConventionsCode PointsCharacter NamesCharacter BlocksSequencesRenderingFigure A-1. Example of RenderingProperties and Property ValuesMiscellaneousExtended BNFTable A-1. Extended BNFCharacter ClassesTable A-2. Character Class ExamplesOperatorsTable A-3. OperatorsB Unicode Publications and ResourcesB.1 The Unicode ConsortiumThe Unicode Technical CommitteeOther ActivitiesB.2 Unicode PublicationsB.3 Unicode Technical StandardsUTS #6: A Standard Compression Scheme for UnicodeUTS #10: Unicode Collation AlgorithmUTS #18: Unicode Regular ExpressionsUTS #22: Character Mapping Markup Language (CharMapML)UTS #35: Unicode Locale Data Markup Language (LDML)UTS #37: Unicode Ideographic Variation DatabaseUTS #39: Unicode Security MechanismsB.4 Unicode Technical ReportsUTR #16: UTF-EBCDICUTR #17: Unicode Character Encoding ModelUTR #20: Unicode in XML and Other Markup LanguagesUTR #23: The Unicode Character Property ModelUTR #25: Unicode Support for MathematicsUTR #26: Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8)UTR #33: Unicode Conformance ModelUTR #36: Unicode Security ConsiderationsUTR #45: U-Source IdeographsB.5 Unicode Technical NotesB.6 Other Unicode Online ResourcesUnicode Online ResourcesUnicode Web SiteUnicode Anonymous FTP SiteChartsCharacter IndexConferencesE-mail Discussion ListFAQ (Frequently Asked Questions)GlossaryOnline Unicode Character DatabaseOnline Unihan DatabasePoliciesUnicode Common Locale Data Repository (CLDR)Updates and ErrataVersionsWhere Is My Character?How to Contact the Unicode ConsortiumC Relationship to ISO/IEC 10646C.1 HistoryTable C-1. TimelineC.2 Encoding Forms in ISO/IEC 10646UCS-4UCS-2Zero ExtendingTable C-2. Zero ExtendingC.3 UTF-8 and UTF-16UTF-8UTF-16C.4 Synchronization of the StandardsC.5 Identification of Features for the Unicode StandardC.6 Character NamesC.7 Character Functional SpecificationsD Changes from Previous VersionsD.1 Versions of the Unicode StandardTable D-1. Versions of Unicode and ISO/IEC 10646-1Table D-2. Allocation of Code Points by TypeTable D-3. Allocation of Code Points by Type (Early Versions)D.2 Clause and Definition UpdatesTable D-4. Version 5.1 Clause and Definition UpdatesTable D-5. Version 5.2 Clause and Definition UpdatesTable D-6. Version 6.0 Clause and Definition UpdatesD.3 Changes from Version 5.2 to Version 6.0D.4 Changes from Version 5.1 to Version 5.2D.5 Changes from Version 5.0 to Version 5.1Arabic ShapingBidirectional BehaviorGeneral CategoryNamed Character SequencesNew Property Definitions and ValuesOther UpdatesUnihanStability PoliciesImportant Clarification of UTF-8 ConformanceUpdates to Definitions of Character SequencesUpdates to Table of Named Unicode AlgorithmsUpdates to Default AlgorithmsUpdates to Stability of PropertiesUpdates for SecurityE Han Unification HistoryE.1 Development of the UROE.2 Ideographic Rapporteur GroupF Documentation of CJK StrokesTable F-1. CJK StrokesR ReferencesR.1 Source Standards and SpecificationsR.2 Source Dictionaries for Han UnificationR.3 Other Sources for the Unicode StandardR.4 Selected Resources: TechnicalR.5 Selected Resources: OtherI General IndexABCDEFGHIJKLMNOPQRSTUVWXYZ
This page contains hyperlinks toThe Unicode Standard, Version 6.0. TheUnicode 6.0.0 page lists the contents with links to each PDF file.