Unicode 5.2 Web BookmarksAbout this pageThis page contains hyperlinks toThe Unicode Standard, Version 5.2. TheUnicode 5.2.0 page lists the contents with links to each PDF file. Front Matter Title and Copyright Foreword Contents List of Figures List of Tables Preface What's New Organization of This Standard Unicode Standard Annexes The Unicode Character Database Unicode Code Charts Unicode Technical Standards and Unicode Technical Reports Updates and Errata 1 Introduction 1.1 Coverage Standards Coverage New Characters 1.2 Design Goals 1.3 Text Handling Text Elements 2 General Structure 2.1 Architectural Context Basic Text Processes Text Elements, Characters, and Text Processes Text Processes and Encoding 2.2 Unicode Design Principles Universality Efficiency Characters, Not Glyphs Semantics Plain Text Logical Order Unification Dynamic Composition Stability Convertibility 2.3 Compatibility Characters Compatibility Variants Compatibility Decomposable Characters 2.4 Code Points and Characters Types of Code Points 2.5 Encoding Forms UTF-32 UTF-16 UTF-8 Comparison of the Advantages of UTF-32, UTF-16, and UTF-8 2.6 Encoding Schemes 2.7 Unicode Strings 2.8 Unicode Allocation Planes Allocation Areas and Character Blocks Assignment of Code Points 2.9 Details of Allocation Plane 0 (BMP) Plane 1 (SMP) Plane 2 (SIP) Other Planes 2.10 Writing Direction 2.11 Combining Characters Sequence of Base Characters and Diacritics Multiple Combining Characters Ligated Multiple Base Characters Exhibiting Nonspacing Marks in Isolation "Characters" and Grapheme Clusters 2.12 Equivalent Sequences and Normalization Normalization Decompositions Non-decomposition of Overlaid Diacritics 2.13 Special Characters and Noncharacters Special Noncharacter Code Points Byte Order Mark (BOM) Layout and Format Control Characters The Replacement Character Control Codes 2.14 Conforming to the Unicode Standard Characteristics of Conformant Implementations Unacceptable Behavior Acceptable Behavior Supported Subsets 3 Conformance 3.1 Versions of the Unicode Standard Stability Version Numbering Errata and Corrigenda References to the Unicode Standard Precision in Version Citation References to Unicode Character Properties References to Unicode Algorithms 3.2 Conformance Requirements Code Points Unassigned to Abstract Characters Interpretation Modification Character Encoding Forms Character Encoding Schemes Bidirectional Text Normalization Forms Normative References Unicode Algorithms Default Casing Algorithms Unicode Standard Annexes 3.3 Semantics Definitions Character Identity and Semantics 3.4 Characters and Encoding 3.5 Properties Types of Properties Property Values Classification of Properties by Their Values Property Status Context Dependence Stability of Properties Simple and Derived Properties Property Aliases Private Use 3.6 Combination Combining Character Sequences Grapheme Clusters Application of Combining Marks 3.7 Decomposition Compatibility Decomposition Canonical Decomposition 3.8 Surrogates 3.9 Unicode Encoding Forms UTF-32 UTF-16 UTF-8 Encoding Form Conversion Constraints on Conversion Processes 3.10 Unicode Encoding Schemes 3.11 Normalization Forms Normalization Stability Combining Classes Specification of Unicode Normalization Forms Starters Canonical Ordering Algorithm Canonical Composition Algorithm Definition of Normalization Forms 3.12 Conjoining Jamo Behavior Definitions Hangul Syllable Boundary Determination Standard Korean Syllables Hangul Syllable Composition Hangul Syllable Decomposition Hangul Syllable Name Generation 3.13 Default Case Algorithms Definitions Default Case Conversion Default Case Detection Default Caseless Matching 4 Character Properties 4.1 Unicode Character Database 4.2 Case-Normative Definitions of Case and Casing Case Mapping 4.3 Combining Classes-Normative Reordrant, Split, and Subjoined Combining Marks 4.4 Directionality-Normative 4.5 General Category-Normative 4.6 Numeric Value-Normative Ideographic Numeric Values 4.7 Bidi Mirrored-Normative 4.8 Name-Normative Unicode Name Property Code Point Labels Use of Character Names in APIs and User Interfaces 4.9 Unicode 1.0 Names 4.10 Letters, Alphabetic, and Ideographic 4.11 Properties Related to Text Boundaries 4.12 Characters with Unusual Properties 5 Implementation Guidelines 5.1 Transcoding to Other Standards Issues Multistage Tables 5.2 Programming Languages and Data Types Unicode Data Types for C 5.3 Unknown and Missing Characters Reserved and Private-Use Character Codes Interpretable but Unrenderable Characters Default Property Values Default Ignorable Code Points Interacting with Downlevel Systems 5.4 Handling Surrogate Pairs in UTF-16 5.5 Handling Numbers 5.6 Normalization 5.7 Compression 5.8 Newline Guidelines Definitions Line Separator and Paragraph Separator Recommendations 5.9 Regular Expressions 5.10 Language Information in Plain Text Requirements for Language Tagging Language Tags and Han Unification 5.11 Editing and Selection Consistent Text Elements 5.12 Strategies for Handling Nonspacing Marks Keyboard Input Truncation 5.13 Rendering Nonspacing Marks Canonical Equivalence Positioning Methods 5.14 Locating Text Element Boundaries 5.15 Identifiers 5.16 Sorting and Searching Culturally Expected Sorting and Searching Language-Insensitive Sorting Searching Sublinear Searching 5.17 Binary Order UTF-8 in UTF-16 Order UTF-16 in UTF-8 Order 5.18 Case Mappings Titlecasing Complications for Case Mapping Reversibility Caseless Matching Normalization and Casing 5.19 Mapping Compatibility Variants 5.20 Unicode Security 5.21 Default Ignorable Code Points 6 Writing Systems and Punctuation 6.1 Writing Systems 6.2 General Punctuation Blocks Devoted to Punctuation Format Control Characters Space Characters Dashes and Hyphens Paired Punctuation Language-Based Usage of Quotation Marks Apostrophes Other Punctuation Archaic Punctuation and Editorial Marks Indic Punctuation CJK Punctuation Unknown or Unavailable Ideographs CJK Compatibility Forms 7 European Alphabetic Scripts 7.1 Latin Letters of Basic Latin: U+0041-U+007A Letters of the Latin-1 Supplement: U+00C0-U+00FF Latin Extended-A: U+0100-U+017F Latin Extended-B: U+0180-U+024F IPA Extensions: U+0250-U+02AF Phonetic Extensions: U+1D00-U+1DBF Latin Extended Additional: U+1E00-U+1EFF Latin Extended-C: U+2C60-U+2C7F Latin Extended-D: U+A720-U+A7FF Latin Ligatures: U+FB00-U+FB06 7.2 Greek Greek: U+0370-U+03FF Greek Extended: U+1F00-U+1FFF Ancient Greek Numbers: U+10140-U+1018F 7.3 Coptic 7.4 Cyrillic Cyrillic: U+0400-U+04FF Cyrillic Supplement: U+0500-U+052F Cyrillic Extended-A: U+2DE0-U+2DFF Cyrillic Extended-B: U+A640-U+A69F 7.5 Glagolitic 7.6 Armenian 7.7 Georgian 7.8 Modifier Letters Spacing Modifier Letters: U+02B0-U+02FF Modifier Tone Letters: U+A700-U+A71F 7.9 Combining Marks Combining Diacritical Marks: U+0300-U+036F Combining Diacritical Marks Supplement: U+1DC0-U+1DFF Combining Marks for Symbols: U+20D0-U+20FF Combining Half Marks: U+FE20-U+FE2F Combining Marks in Other Blocks 8 Middle Eastern Scripts 8.1 Hebrew Hebrew: U+0590-U+05FF Alphabetic Presentation Forms: U+FB1D-U+FB4F 8.2 Arabic Arabic: U+0600-U+06FF Arabic Cursive Joining Arabic Ligatures Arabic Supplement: U+0750-U+077F Arabic Presentation Forms-A: U+FB50-U+FDFF Arabic Presentation Forms-B: U+FE70-U+FEFF 8.3 Syriac Syriac: U+0700-U+074F Syriac Shaping 8.4 Samaritan 8.5 Thaana 9 South Asian Scripts-I 9.1 Devanagari Devanagari: U+0900-U+097F Principles of the Devanagari Script Rendering Devanagari Devanagari Extended: U+A8E0-U+A8FF Vedic Extensions: U+1CD0-U+1CFF 9.2 Bengali (Bangla) 9.3 Gurmukhi 9.4 Gujarati 9.5 Oriya 9.6 Tamil Tamil: U+0B80-U+0BFF Tamil Vowels Tamil Ligatures Tamil Named Character Sequences 9.7 Telugu 9.8 Kannada Kannada: U+0C80-U+0CFF Principles of the Kannada Script Rendering Kannada 9.9 Malayalam 10 South Asian Scripts-II 10.1 Sinhala 10.2 Tibetan 10.3 Lepcha 10.4 Phags-pa 10.5 Limbu 10.6 Syloti Nagri 10.7 Kaithi 10.8 Saurashtra 10.9 Meetei Mayek 10.10 Ol Chiki 10.11 Kharoshthi Kharoshthi: U+10A00-U+10A5F Rendering Kharoshthi 11 Southeast Asian Scripts 11.1 Thai 11.2 Lao 11.3 Myanmar Myanmar: U+1000-U+109F Myanmar Extended-A: U+AA60-U+AA7F Khamti Shan Aiton and Phake 11.4 Khmer Khmer: U+1780-U+17FF Principles of the Khmer Script Khmer Symbols: U+19E0-U+19FF 11.5 Tai Le 11.6 New Tai Lue 11.7 Tai Tham 11.8 Tai Viet 11.9 Kayah Li 11.10 Cham 11.11 Philippine Scripts Tagalog: U+1700-U+171F Hanunóo: U+1720-U+173F Buhid: U+1740-U+175F Tagbanwa: U+1760-U+177F Principles of the Philippine Scripts 11.12 Buginese 11.13 Balinese 11.14 Javanese 11.15 Rejang 11.16 Sundanese 12 East Asian Scripts 12.1 Han CJK Unified Ideographs CJK Standards Blocks Containing Han Ideographs General Characteristics of Han Ideographs Principles of Han Unification Unification Rules Abstract Shape Han Ideograph Arrangement Mappings for Han Ideographs CJK Unified Ideographs Extension B: U+20000-U+2A6D6 CJK Unified Ideographs Extension C: U+2A700-U+2B734 CJK Compatibility Ideographs: U+F900-U+FAFF CJK Compatibility Supplement: U+2F800-U+2FA1D Kanbun: U+3190-U+319F Symbols Derived from Han Ideographs CJK and KangXi Radicals: U+2E80-U+2FD5 CJK Additions from HKSCS and GB 18030 CJK Strokes: U+31C0-U+31EF 12.2 Ideographic Description Characters 12.3 Bopomofo 12.4 Hiragana and Katakana Hiragana: U+3040-U+309F Katakana: U+30A0-U+30FF Katakana Phonetic Extensions: U+31F0-U+31FF 12.5 Halfwidth and Fullwidth Forms 12.6 Hangul Hangul Jamo: U+1100-U+11FF Hangul Jamo Extended-A: U+A960-U+A97F Hangul Jamo Extended-B: U+D7B0-U+D7FF Hangul Compatibility Jamo: U+3130-U+318F Hangul Syllables: U+AC00-U+D7A3 12.7 Yi 13 Additional Modern Scripts 13.1 Ethiopic Ethiopic: U+1200-U+137F Ethiopic Extensions: U+1380-U+139F, U+2D80-U+2DDF 13.2 Mongolian 13.3 Osmanya 13.4 Tifinagh 13.5 N'Ko 13.6 Vai 13.7 Bamum 13.8 Cherokee 13.9 Canadian Aboriginal Syllabics Canadian Aboriginal Syllabics: U+1400-U+167F Canadian Aboriginal Syllabics Extended: U+18B0-U+18FF 13.10 Deseret 13.11 Shavian 13.12 Lisu 14 Ancient and Historic Scripts 14.1 Ogham 14.2 Old Italic 14.3 Runic 14.4 Gothic 14.5 Old Turkic 14.6 Linear B Linear B Syllabary: U+10000-U+1007F Linear B Ideograms: U+10080-U+100FF Aegean Numbers: U+10100-U+1013F 14.7 Cypriot Syllabary 14.8 Ancient Anatolian Alphabets Lycian: U+10280-U+1029F Carian: U+102A0-U+102DF Lydian: U+10920-U+1093F 14.9 Old South Arabian 14.10 Phoenician 14.11 Imperial Aramaic 14.12 Inscriptional Parthian and Inscriptional Pahlavi 14.13 Avestan 14.14 Ugaritic 14.15 Old Persian 14.16 Sumero-Akkadian Cuneiform: U+12000-U+123FF Cuneiform Numbers and Punctuation: U+12400-U+1247F 14.17 Egyptian Hieroglyphs 15 Symbols 15.1 Currency Symbols 15.2 Letterlike Symbols Letterlike Symbols: U+2100-U+214F Mathematical Alphanumeric Symbols: U+1D400-U+1D7FF Mathematical Alphabets Fonts Used for Mathematical Alphabets 15.3 Number Forms Number Forms: U+2150-U+218F Common Indic Number Forms: U+A830-U+A83F Rumi Numeral Forms: U+10E60-U+10E7E CJK Number Forms Superscripts and Subscripts: U+2070-U+209F 15.4 Mathematical Symbols Mathematical Operators: U+2200-U+22FF Supplements to Mathematical Symbols and Arrows Supplemental Mathematical Operators: U+2A00-U+2AFF Miscellaneous Mathematical Symbols-A: U+27C0-U+27EF Miscellaneous Mathematical Symbols-B: U+2980-U+29FF Miscellaneous Symbols and Arrows: U+2B00-U+2B7F Arrows: U+2190-U+21FF Supplemental Arrows Standardized Variants of Mathematical Symbols 15.5 Invisible Mathematical Operators 15.6 Technical Symbols Control Pictures: U+2400-U+243F Miscellaneous Technical: U+2300-U+23FF Optical Character Recognition: U+2440-U+245F 15.7 Geometrical Symbols Box Drawing and Box Elements Geometric Shapes: U+25A0-U+25FF 15.8 Miscellaneous Symbols and Dingbats Miscellaneous Symbols: U+2600-U+26FF Dingbats: U+2700-U+27BF Mahjong Tiles: U+1F000-U+1F02F Domino Tiles: U+1F030-U+1F09F Yijing Hexagram Symbols: U+4DC0-U+4DFF Tai Xuan Jing Symbols: U+1D300-U+1D356 Ancient Symbols: U+10190-U+101CF Phaistos Disc Symbols: U+101D0-U+101FF 15.9 Enclosed and Square Enclosed Alphanumerics: U+2460-U+24FF Enclosed CJK Letters and Months: U+3200-U+32FF CJK Compatibility: U+3300-U+33FF Enclosed Alphanumeric Supplement: U+1F100-U+1F1FF Enclosed Ideographic Supplement: U+1F200-U+1F2FF 15.10 Braille 15.11 Western Musical Symbols 15.12 Byzantine Musical Symbols 15.13 Ancient Greek Musical Notation 16 Special Areas and Format Characters 16.1 Control Codes Representing Control Sequences Specification of Control Code Semantics 16.2 Layout Controls Line and Word Breaking Cursive Connection and Ligatures Combining Grapheme Joiner Bidirectional Ordering Controls 16.3 Deprecated Format Characters 16.4 Variation Selectors 16.5 Private-Use Characters Private Use Area: U+E000-U+F8FF Supplementary Private Use Areas 16.6 Surrogates Area 16.7 Noncharacters 16.8 Specials Byte Order Mark (BOM): U+FEFF Specials: U+FFF0-U+FFF8 Annotation Characters: U+FFF9-U+FFFB Replacement Characters: U+FFFC-U+FFFD 16.9 Deprecated Tag Characters Deprecated Tag Characters: U+E0000-U+E007F Syntax for Embedding Tags Working with Language Tags Unicode Conformance Issues Formal Tag Syntax 17 Code Charts 17.1 Character Names List Images in the Code Charts and Character Lists Special Characters and Code Points Character Names Informative Aliases Normative Aliases Cross References Information About Languages Case Mappings Decompositions Subheads 17.2 CJK Unified Ideographs 17.3 Hangul Syllables 18 Han Radical-Stroke IndexA Notational Conventions Code Points Character Names Character Blocks Sequences Rendering Properties and Property Values Miscellaneous Extended BNF Operators B Unicode Publications and Resources B.1 The Unicode Consortium The Unicode Technical Committee Other Activities B.2 Unicode Publications B.3 Unicode Technical Standards UTS #6: A Standard Compression Scheme for Unicode UTS #10: Unicode Collation Algorithm UTS #18: Unicode Regular Expressions UTS #22: Character Mapping Markup Language (CharMapML) UTS #35: Unicode Locale Data Markup Language (LDML) UTS #37: Unicode Ideographic Variation Database UTS #39: Unicode Security Mechanisms B.4 Unicode Technical Reports UTR #16: UTF-EBCDIC UTR #17: Unicode Character Encoding Model UTR #20: Unicode in XML and Other Markup Languages UTR #23: The Unicode Character Property Model UTR #25: Unicode Support for Mathematics UTR #26: Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8) UTR #33: Unicode Conformance Model UTR #36: Unicode Security Considerations UTR #45: U-Source Ideographs B.5 Unicode Technical Notes B.6 Other Unicode Online Resources Unicode Online Resources How to Contact the Unicode Consortium C Relationship to ISO/IEC 10646 C.1 History C.2 Encoding Forms in ISO/IEC 10646 Zero Extending C.3 UCS Transformation Formats UTF-8 UTF-16 C.4 Synchronization of the Standards C.5 Identification of Features for the Unicode Standard C.6 Character Names C.7 Character Functional Specifications D Changes from Previous Versions D.1 Versions of the Unicode Standard D.2 Clause and Definition Updates D.3 Changes from Version 5.1 to Version 5.2 D.4 Changes from Version 5.0 to Version 5.1 E Han Unification History E.1 Development of the URO E.2 Ideographic Rapporteur Group R References R.1 Source Standards and Specifications R.2 Source Dictionaries for Han Unification R.3 Other Sources for the Unicode Standard R.4 Selected Resources: Technical R.5 Selected Resources: Scripts and LanguagesI General IndexABCDEFGHIJKLMNOPQRSTUVWXYZ
This page contains hyperlinks toThe Unicode Standard, Version 5.2. TheUnicode 5.2.0 page lists the contents with links to each PDF file.
Front Matter
A Notational Conventions
I General Index