Movatterモバイル変換


[0]ホーム

URL:


[Unicode]  Home |Site Map |Search
 
6.2.0 Core Specification for 6.3
All Chapters and Appendices Together:
Full Text pdf for Viewing (11 MB)
6.2.0 Front Matter for 6.3
Title and Copyright
Contents
Unicode 6.2 Web Bookmarks
List of Figures
List of Tables
Preface
6.2.0 Chapters for 6.3
1Introduction
2General Structure
3Conformance
4Character Properties
5Implementation Guidelines
6Writing Systems and Punctuation
7European Alphabetic Scripts
8Middle Eastern Scripts
9South Asian Scripts - I
10South Asian Scripts - II
11Southeast Asian Scripts
12East Asian Scripts
13Additional Modern Scripts
14Ancient and Historic Scripts
15Symbols
16Special Areas and Format Characters
17About the Code Charts
6.2.0 Appendices and Back Matter
ANotational Conventions
BUnicode Publications and Resources
CRelationship to ISO/IEC 10646
DChanges from Previous Versions
EHan Unification History
F Documentation of CJK Strokes
RReferences
IGeneral Index
Code Charts
Latest Code Charts
Delta Code Charts (additions to 6.3.0 highlighted)
Archival Code Charts (6.3.0)
Han Radical-Stroke Indices
Interactive Han Radical-Stroke Index
IICore Radical-Stroke Index (3.2 MB)
Full Han Radical-Stroke Index (25 MB, unchanged from 6.1.0)
6.3.0 Unicode Standard Annexes
UAX #9: The Unicode Bidirectional Algorithm
UAX #11: East Asian Width
UAX #14: Unicode Line Breaking Algorithm
UAX #15: Unicode Normalization Forms
UAX #24: Unicode Script Property
UAX #29: Unicode Text Segmentation
UAX #31: Unicode Identifier and Pattern Syntax
UAX #34: Unicode Named Character Sequences
UAX #38: Unicode Han Database (Unihan)
UAX #41: Common References for Unicode Standard Annexes
UAX #42: Unicode Character Database in XML
UAX #44: Unicode Character Database
UAX #45: U-Source Ideographs
6.3.0 UCD
6.3.0 (files) (about)
6.3.0 Zipped files (for bulk download)
Related Links
Unicode Acknowledgements
Archive of Unicode Versions
Updates and Errata
Glossary of Unicode Terms
Unicode Character Name Index
Technical Reports

Unicode® 6.3.0

Released: 2013 September 30 (Announcement)

Version 6.3.0 has been superseded by thelatest version of the Unicode Standard.

This page summarizes the important changes for the Unicode Standard, Version 6.3.0.

The core specification was not republished for Version 6.3. Thus the chapters of the core specification use the Version 6.2.0 PDF files.
A. Summary
B. Version Information
C. Stability Policy Update
D. Textual Changes and Character Additions
E. Conformance Changes
F. Changes in the Unicode Character Database
G. Changes in the Unicode Standard Annexes
H. Changes in Synchronized Unicode Technical Standards

A. Summary

Version 6.3 of the Unicode Standard is a special release focused on delivering significantly improved bidirectional behavior.

Bidirectional Behavior Improvements

This new version updates the Unicode Bidirectional Algorithm to ensure that pairs of parentheses and brackets have consistent layout and to provide a mechanism for isolating runs of text.

The updated Bidirectional Algorithm together with five newly introduced bidi format characters will improve the display of text for hundreds of millions of users of Arabic, Hebrew, Persian, Urdu, and many others. The display and positioning of parentheses will better match the normal behavior that users expect.  By using the new methods for isolating runs of text, software will be able to construct messages from different sources without jumbling the order of characters. The new bidi format characters correspond to features in markup (such as in CSS). Overall, these improvements bring greater interoperability and an improved ability for inserting text and assembling user interface elements in these languages.

The improvements come with new rigor: the Consortium now offers two reference implementations and greatly improved testing and test data.

Other Enhancements

In a major enhancement for CJK usage, this new version adds standardized variation sequences for all 1,002 CJK compatibility ideographs. These sequences address a well-known issue of the CJK compatibility ideographs—that they could change their appearance  when any process normalized the text. Using the new standardized variation sequences allows authors to write text which will preserve the specific required shapes of these CJK ideographs, even under Unicode normalization.

Version 6.3 includes other improvements as well:

  • Improved Unihan data to better align with ISO/IEC 10646
  • Better support for Hebrew word break behavior and for ideographic space in line breaking

This version also rolls in a change in Definition D136 (case-ignorable) of the core specification, various minor corrections for errata, and other small updates for the Unicode Character Database.

Synchronization

Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 6.3:

This version of the Unicode Standard is synchronized with ISO/IEC 10646:2012, plus the accelerated publication of 5 bidirectional format control characters: U+061C ARABIC LETTER MARK and the isolate span controls U+2066..U+2069.

See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.

B. Version Information

Version 6.3 of the Unicode Standard consists of the core specification (unchanged from Version 6.2, except for Definition D136), the delta and archival code charts for this version, the Unicode Standard Annexes, and the Unicode Character Database (UCD).

The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.

Version 6.3.0 of the Unicode Standard should be referenced as:

The Unicode Consortium.The Unicode Standard, Version 6.3.0, (Mountain View, CA: The Unicode Consortium, 2013. ISBN 978-1-936213-08-5)
http://www.unicode.org/versions/Unicode6.3.0/

The terms “Version 6.3” or “Unicode 6.3” are abbreviations for the full version reference, Version 6.3.0.

The citation and permalink for the latest published version of the Unicode Standard is:

The Unicode Consortium.The Unicode Standard.
http://www.unicode.org/versions/latest/

A complete specification of the contributory files for Unicode 6.3 is found on the pageComponents for 6.3.0. That page also provides the recommended reference format for Unicode Standard Annexes.

The navigation bar on the left of this page provides links to both thecore specification as a single file, as well as toindividual chapters, and theappendices. Also provided are links to thecode charts, theradical-stroke indices to CJK ideographs, theUnicode Standard Annexes and the data files forVersion 6.3 of the Unicode Character Database.

Code Charts

Several sets of code charts are available. They serve different purposes:

  • Thelatest set of code charts for the Unicode Standard are available online. Those charts are always the most current code charts available, and may be updated at any time. The charts are organized by scripts and blocks for easy reference. An onlineindex by character name is also provided.

For Unicode 6.3.0 in particular two additional sets of code chart pages are provided:

  • Aset of delta code charts showing the blocks in which bidirectional format controls were added for Unicode 6.3.0. Those characters are visually highlighted in the relevant chart. These delta code charts also include blocks which contain significant glyph changes to fix errata.
  • Aset of archival code charts that represent the entire set of characters, names and representative glyphs at the time of publication of Unicode 6.3.0.

The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.

Errata

Errata incorporated into Unicode 6.3 are listed by date in aseparate table. For corrigenda and errata after the release of Unicode 6.3, see the list of currentUpdates and Errata.

C. Stability Policy Update

The statement of the stability policy for the Bidi_Class property was slightly reworded to clarify the exact type of changes allowed for it. This update is related to the changes in Unicode 6.3.0 for the Unicode Bidirectional Algorithm.

A constraint was added for the new Bidi_Paired_Bracket_Type (bpt) property, to guarantee that characters given either bpt=Open or bpt=Close (intended to be limited to paired brackets) also have Bidi_Class=ON and Bidi_Mirrored=Yes, for consistency.

A new constraint was added to guarantee that characters with the General_Category property value Number also have a Numeric_Type property value distinct from None.

For details about each of these changes or additions, seeProperty Value Stability.

Note: TheUnicode Character Encoding Stability Policy restricts possible future changes to the Unicode Standard, but is not formally a part of the standard itself.

D. Textual Changes and Character Additions

In Version 6.3 of the core specification,Section 3.13, Default Case Algorithms, Definition D136 has been updated as follows:

D136. A character C is defined to becase-ignorable if C has the value MidLetter (ML), MidNumLet (MB), or Single_Quote (SQ) for the Word_Break property or its General_Category is one of Nonspacing_Mark (Mn), Enclosing_Mark (Me), Format (Cf), Modifier_Letter (Lm), or Modifier_Symbol (Sk).

Changes in the Unicode Standard Annexes are listed inSection G.

Character Assignment Overview

Five new character assignments were made for the Unicode Standard, Version 6.3, as shown in the following table. This addition brings the total number of characters assigned in the standard to 110,122. (That is the traditional count, which totals up graphic and format characters, but omits surrogate code points, ISO control codes, noncharacters, and private-use allocations.)

U+061C ARABIC LETTER MARK
U+2066 LEFT-TO-RIGHT ISOLATE
U+2067 RIGHT-TO-LEFT ISOLATE
U+2068 FIRST STRONG ISOLATE
U+2069 POP DIRECTIONAL ISOLATE

No new blocks are defined in Version 6.3.

E. Conformance Changes

In Version 6.3 of the core specification, the derivation of the property Case_Ignorable in Definition D136 has been updated to account for the change in the Word_Break property value of U+0027 APOSTROPHE from MidNumLet to Single_Quote.

Except for the update to Definition D136, there are no significant conformance changes in the core specification. However, there are significant conformance changes to the Unicode Bidirectional Algorithm inUAX #9, which may also affect incidental discussion about the Unicode Bidirectional Algorithm in several sections of the core specification.

F. Changes in the Unicode Character Database

The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 6.3 can be found inUAX #44, Unicode Character Database. The changes listed there include character additions and property revisions to existing characters that will affect implementations. The most notable changes are summarized below.

Changes Related to the Unicode Bidirectional Algorithm

  • The five newly-encoded characters are all Bidi_Control characters. U+061C ARABIC LETTER MARK, abbreviated ALM, is similar to the bidirectional ordering control RLM except that its Bidi_Class property value is AL. The explicit directional isolates U+2066..U+2069 mark a span of text as directionally isolated from its surroundings.
  • The Bidi_Class property has been extended with four new values for directional isolates.
  • Two new normative properties, Bidi_Paired_Bracket and Bidi_Paired_Bracket_Type, have been introduced together with a new normative contributory file, BidiBrackets.txt, for the specification of bracket pairs in bidirectional text.
  • The General_Category property values of the floor and ceiling delimiters, U+2308..U+230B, have been changed from Sm to Ps or Pe, to form bidirectional bracket pairs.
  • A new conformance test data file has been added, BidiCharacterTest.txt, and the existing BidiTest.txt has been augmented with test cases containing new edge cases and the new Bidi_Class property values.

Changes Related to Line Breaking and Text Segmentation

  • The Line_Break property value of U+3000 IDEOGRAPHIC SPACE has been changed from ID to BA.
  • Hebrew letters and basic punctuation marks have been assigned the newly introduced Word_Break property values Hebrew_Letter, Single_Quote, and Double_Quote.
  • U+02D7 MODIFIER LETTER MINUS SIGN has been assigned the Word_Break property value MidLetter.

Changes Related to CJK Characters and the Unihan Database

  • A set of 245 new U-Source ideographs has been added.
  • A set of 1002 standardized variation sequences has been added, one sequence per CJK compatibility ideograph in Unicode 6.3. The sequences consist of CJK unified ideographs and variation selectors U+FE00..U+FE02, and have the intended visual appearance of the corresponding CJK compatibility ideographs.
  • The kHanyuPinlu fields have been revised systematically to use accents instead of numbers for tones.

Miscellaneous Changes

  • Mongolian and Phags-pa characters have been given a Joining_Type classification for contextual shaping. As a part of these additions, one Phags-pa character has the Joining_Type value of L (Left Joining), which no character had been assigned before. This change may impact the implementations of cursive rendering engines.
  • The General_Category property value of U+180E MONGOLIAN VOWEL SEPARATOR has been changed from Zs to Cf. The values of other related properties such as Bidi_Class, White_Space, and Other_Default_Ignorable_Code_Point have been updated accordingly.
  • The unassigned code points in the Currency Symbols block have been given the Bidi_Class property value ET and the Line_Break property value PR, to help implementations support new currency symbols, when they are encoded.
  • Nine named character sequences have been added for Uighur and Chagatai.

G. Changes in the Unicode Standard Annexes

In Version 6.3, many of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.

Unicode Standard AnnexChanges
UAX #9
Unicode Bidirectional Algorithm
The Unicode Bidirectional Algorithm was substantially extended to support isolate runs and to resolve paired brackets as a unit. For the former extension, four new Bidi_Class property values were added. For the latter, two normative properties and an algorithm rule N0 were introduced. Additional definitions, rule revisions, notes, and examples were included, and a new test file was added.
UAX #11
East Asian Width
UAX #14
Unicode Line Breaking Algorithm
The description of the CM class was updated to reflect a refinement in line breaking for U+3035 VERTICAL KANA REPEAT MARK LOWER HALF, and the description of the BA class was updated to reflect a change for U+3000 IDEOGRAPHIC SPACE.
UAX #15
Unicode Normalization Forms
UAX #24
Unicode Script Property
UAX #29
Unicode Text Segmentation
There were some minor updates made for word segmentation. Apostrophe and double quote are now allowed within a strictly Hebrew word context, to reflect their common use in place of geresh and gershayim.
UAX #31
Unicode Identifier and Pattern Syntax
UAX #34
Unicode Named Character Sequences
UAX #38
Unicode Han Database (Unihan)
The status of kCompatibilityVariant was clarified. kHanyuPinlu was changed to use accents instead of numbers for tones, and the regular expression for it was modified accordingly. Many other minor documentation updates were made.
UAX #41
Common References for Unicode Standard Annexes
Minor updates were made to the references.
UAX #42
Unicode Character Database in XML
Changes were made to track additional properties and property values for the Unicode Bidirectional Algorithm.
UAX #44
Unicode Character Database
The status of default values was clarified. Numerous changes were made to reflect changes to the Unicode Bidirectional Algorithm and its associated character properties and data files. A clarification was added about Numeric_Type=Digit.
UAX #45
U-Source Ideographs
245 characters were added to the list of U-Source ideographs. A new status of UNC-2013 was added and documented.

H. Changes in Synchronized Unicode Technical Standards

There are also significant revisions in the Unicode Technical Standards whose versions are synchronized with the Unicode Standard. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UTS, linked directly from the following list of UTSes.

Unicode Technical StandardChanges
UTS #10
Unicode Collation Algorithm
The CLDR root collation data files contained in CollationAuxiliary.zip, along with the related documentation, have been moved from the UCA release directory to theroot collation data files in theCLDR repository. Trailing collation elements are now given regular tertiary weights in DUCET, which allows for full case differences among compatibility characters. Digits from all scripts are now given the same weights as ASCII digits in DUCET, rather than being distinguished by secondary weights. The IgnoreSP option for handling variables (intended for ignoring punctuation but not symbols) has been removed. The weights 0xFFFD..0xFFFF are now reserved for special collation elements. In addition, the text of UTS #10 has been reorganized for better flow.
UTS #46
Unicode IDNA Compatibility Processing
The five new bidirectional format controls were added. They are given the valueignored in IdnaMappingTable.txt. They have the statusdisallowed in IDNA2008.

Access to Copyright and terms of use

[8]ページ先頭

©2009-2025 Movatter.jp