Movatterモバイル変換


[0]ホーム

URL:


[Unicode]
 

Unicode Technical Note #27

Known Anomalies in
Unicode Character Names

Version8
AuthorsAsmus Freytag, Rick McGowan, and Ken Whistler
DateAugust 13, 2024
This Versionhttps://www.unicode.org/notes/tn27/tn27-8.html
Previous Versionhttps://www.unicode.org/notes/tn27/tn27-7.html
Latest Versionhttps://www.unicode.org/notes/tn27/

Summary

This documentprovides information on many known anomalies in the formal character names in the Unicode Standard.

Status

This document is aUnicode Technical Note. Sole responsibility for its contents rests with the author(s). Publication does not imply any endorsement by the Unicode Consortium.

For information on Unicode Technical Notes, including criteria for acceptance, seehttps://www.unicode.org/notes/.


Introduction

In this document we list all Unicode character names with known clerical errors in the spelling of their names at the time of its writing. In addition, we have compiled information on many misnamed characters, misleading character names, and characters with other known problems with their names.

Because Unicode Standard is acharacter encodingstandard and not theUniversal Encyclopedia of WritingSystems and Character Identity, thestability anduniquenessof published character names is far more important than the correctness of the name. The published character names arenormative for the purposes of theUnicode standard and the large number of other IT standards thatreference it. These standards require stable identifiers and character names must therefore beimmutable— any change of character names is almostas disruptive of the standards as changing code points forcharacters would be. Accordingly, the Unicode Consortium has adopted theName Stability Policy, preventing changes in character names. As a result, errors in character names cannot be corrected. Instead, important character name anomalies anomalies are documented with annotations in theUnicode Character Code Charts.

The requirement for a unique and stable character name that can be used as a formal identifier doesnot mean that the Unicode Standard dictates toanyone what the name of any given letter in their writing systemshould properly be, whether in English or in any other language. The Unicode Code Charts provide informative aliases for a large number of characters, the names of which are not anomalous or defective. This is because different user communities often use different names for the same character, even in English.

One of the reasons why the Unicode standard publishes manyinformative aliases in the Unicode names list is because there often aremuch better, more communicative names for particular characters,even in English than the normative names in the data file. For example, U+002F SOLIDUS is more widely known among its American users asslash. Informal aliases are useful in describing a character, but cannot be used as identifiers, because they are not guaranteed to be unique or stable. Users are free to use such aliases and other names, as long as they are not mis-represented as correctionsto the standard, but instead used as alternative, more usefulnames for charactersin the standard.

For character names that were encoded with misspelled words as part of their name, or that exhibit other serious errors, The Unicode Standard has adopted normative character name aliases. These formal name aliases can be used as a alternative, normative identifier for the character without the need to preserve the original spelling or other error in the character name. While this means that some characters can have more than one identifier, each identifier continues to uniquely refer to a single character. Formal name aliases are documented in the NameAliases.txt file in theUnicode Character Database. Formal name aliases also documented in the Unicode Code Charts. We have not documented them all here; instead, we merely indicate for which characters formal aliases exist at the time of this writing.

In some cases, annotations have been added to the names list in the Unicode Standard to document various lesser problems, but to date there has been no full listing of all known problems.

The authors therefore intend this Technical Note to serve as a convenient summary of the information about character name anomalies in the Unicode Standard at the time of its writing. It will be updated from time to time as additional anomalies become known. While the information in this technical note is based on information published in the Unicode Standard, the selection and manner of presentation in this document reflect choices made by its authors; it does not in any way supersede the information in the Unicode Standard.

List of Known Anomalies and Explanations

This section lists character names with known anomalies, including those for which a formal name alias has been defined. It provides further information about some names that have been the objects of discussion or inquiry. As issues are reported, additional entries may be added at any time and without notice. While many of the explanations below are based on annotations in the Unicode code charts, they have been edited or re-stated by the authors.

U+0149 LATIN SMALL LETTER N PRECEDED BY APOSTROPHE

U+01A2 LATIN CAPITAL LETTER
U+01A3 LATIN SMALL LETTER

U+01BE LATIN LETTER INVERTED GLOTTAL STOP WITH STROKE

U+0238 LATIN SMALL LETTER DB DIGRAPH
U+0239 LATIN SMALL LETTER QP DIGRAPH

U+025B LATIN SMALL LETTER OPEN E

U+025E LATIN SMALL LETTER CLOSED REVERSED OPEN E

U+0285 LATIN SMALL LETTER SQUAT REVERSED ESH

U+02C7 CARON
U+030C COMBINING CARON

U+034F COMBINING GRAPHEME JOINER

U+039B GREEK CAPITAL LETTER LAMDA
U+03BB GREEK SMALL LETTER LAMDA

U+04A5 CYRILLIC SMALL LIGATURE EN GHE
U+04B5 CYRILLIC SMALL LIGATURE TE TSE
U+04D5 CYRILLIC SMALL LIGATURE A IE

U+0598 HEBREW ACCENT ZARQA

U+05AE HEBREW ACCENT ZINOR

U+0670 ARABIC LETTER SUPERSCRIPT ALEF

U+06C0 ARABIC LETTER HEH WITH YEH ABOVE
U+06C2 ARABIC LETTER HEH GOAL WITH HAMZA ABOVE
U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE

U+0709 SYRIAC SUBLINEAR COLON SKEWED

U+0964 DEVANAGARI DANDA
U+0965 DEVANAGARI DOUBLE DANDA

U+0A01 GURMUKHI SIGN ADAK BINDI

U+0B83 TAMIL SIGN VISARGA

U+0CDE KANNADA LETTER

U+0E9D LAO LETTER

U+0E9F LAO LETTER

U+0EA3 LAO LETTER

U+0EA5 LAO LETTER

U+0F0A TIBETAN MARK BKA- SHOG YIG MGO

U+0F0B TIBETAN MARK INTERSYLLABIC TSHEG

U+0F0C TIBETAN MARK DELIMITER TSHEG BSTAR

U+0FD0 TIBETAN MARK BKA- SHOG GI MGO RGYAN

U+11EC HANGUL JONGSEONG -KIYEOK
U+11ED HANGUL JONGSEONG -SSANGKIYEOK
U+11EE HANGUL JONGSEONG SSANG
U+11EF HANGUL JONGSEONG -KHIEUKH

U+156F CANADIAN SYLLABICS TTH

U+178E KHMER LETTER NNO

U+179E KHMER LETTER SSO

U+1BBD SUNDANESE LETTER

U+200B ZERO WIDTH SPACE

U+2113 SCRIPT SMALL L

U+2118 SCRIPT CAPITAL P

U+234A APL FUNCTIONAL SYMBOL TACK UNDERBAR
U+234E APL FUNCTIONAL SYMBOL TACK JOT
U+2351 APL FUNCTIONAL SYMBOL TACK OVERBAR
U+2355 APL FUNCTIONAL SYMBOL TACK JOT
U+2361 APL FUNCTIONAL SYMBOL TACK DIAERESIS

U+2448 OCR DASH
U+2449 OCR CUSTOMER ACCOUNT NUMBER

U+2629 CROSS OF JERUSALEM

U+262B FARSI SYMBOL

U+2B7A LEFTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE STROKE
U+2B7C RIGHTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE STROKE

U+3021 HANGZHOU NUMERAL ONE
U+3022 HANGZHOU NUMERAL TWO
U+3023 HANGZHOU NUMERAL THREE
U+3024 HANGZHOU NUMERAL FOUR
U+3025 HANGZHOU NUMERAL FIVE
U+3026 HANGZHOU NUMERAL SIX
U+3027 HANGZHOU NUMERAL SEVEN
U+3028 HANGZHOU NUMERAL EIGHT
U+3029 HANGZHOU NUMERAL NINE
U+3038 HANGZHOU NUMERAL TEN
U+3039 HANGZHOU NUMERAL TWENTY
U+303A HANGZHOU NUMERAL THIRTY

U+3036 CIRCLED POSTAL MARK

U+327C CIRCLED KOREAN CHARACTER CHAMKO
U+327D CIRCLED KOREAN CHARACTER JUEUI

U+A015 YI SYLLABLE

U+AA6E MYANMAR LETTER KHAMTI

U+FA0E CJK COMPATIBILITY IDEOGRAPH-FA0E
U+FA0F CJK COMPATIBILITY IDEOGRAPH-FA0F
U+FA11 CJK COMPATIBILITY IDEOGRAPH-FA11
U+FA13 CJK COMPATIBILITY IDEOGRAPH-FA13
U+FA14 CJK COMPATIBILITY IDEOGRAPH-FA14
U+FA1F CJK COMPATIBILITY IDEOGRAPH-FA1F
U+FA21 CJK COMPATIBILITY IDEOGRAPH-FA21
U+FA23 CJK COMPATIBILITY IDEOGRAPH-FA23
U+FA24 CJK COMPATIBILITY IDEOGRAPH-FA24
U+FA27 CJK COMPATIBILITY IDEOGRAPH-FA27
U+FA28 CJK COMPATIBILITY IDEOGRAPH-FA28
U+FA29 CJK COMPATIBILITY IDEOGRAPH-FA29

U+FE18 PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAET

U+FEFF ZERO WIDTH NO-BREAK SPACE

U+122D4 CUNEIFORM SIGN TENU
U+122D5 CUNEIFORM SIGN OVER BUR OVER BUR

U+12327 CUNEIFORM SIGN UN GUNU

U+1680B BAMUM LETTER PHASE-A MAEMGBIEE

16E56 MEDEFAIDRIN CAPITAL LETTER H
16E57 MEDEFAIDRIN CAPITAL LETTER N
16E76 MEDEFAIDRIN SMALL LETTER H
16E77 MEDEFAIDRIN SMALL LETTER N

U+1B001 HIRAGANA LETTER ARCHAIC YE

U+1D0C5 BYZANTINE MUSICAL SYMBOL FORA SKLIRON CHROMA VASIS

U+1D300 MONOGRAM FOR
U+1D301 DIGRAM FOR HEAVENLY
U+1D302 DIGRAM FOR
U+1D303 DIGRAM FOR HEAVEN
U+1D304 DIGRAM FOR
U+1D305 DIGRAM FOR

U+1D300 MONOGRAM FOR HUMAN
U+1D301 DIGRAM FOR HEAVENLY HUMAN
U+1D302 DIGRAM FOR EARTHLY HUMAN
U+1D303 DIGRAM FOR HUMANLY HEAVEN
U+1D304 DIGRAM FOR HUMANLY EARTH
U+1D305 DIGRAM FOR HUMANLY HUMAN

U+1D6B2 MATHEMATICAL BOLD CAPITAL LAMDA
U+1D6CC MATHEMATICAL BOLD SMALL LAMDA
U+1D6EC MATHEMATICAL ITALIC CAPITAL LAMDA
U+1D706 MATHEMATICAL ITALIC SMALL LAMDA
U+1D726 MATHEMATICAL BOLD ITALIC CAPITAL LAMDA
U+1D740 MATHEMATICAL BOLD ITALIC SMALL LAMDA
U+1D760 MATHEMATICAL SANS-SERIF BOLD CAPITAL LAMDA
U+1D79A MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL LAMDA
U+1D77A MATHEMATICAL SANS-SERIF BOLD SMALL LAMDA
U+1D79A MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL LAMDA

U+1E899 MENDE KIKAKUI SYLLABLE M172
U+1E89A MENDE KIKAKUI SYLLABLE M174

 

Appendix A: Notes on Zarqa and Zinor

There are two separate cantillation systems in the Hebrew Bible. One is used for Psalms, Proverbs and (most of) Job, (the "poetic" books, hence the "poetic system"), and the other is used everywhere else. The two systems have structural similarities and share some graphemes, but not all. In modern printing the accents have roughly the same shape; old manuscripts actually had them written slightly differently. In the prose system there is an accent called ZARQA, which is postposed (on or to the left of the last letter), and in the poetic system there is one called TSINOR (and also zarqa and vice-versa; each of these has many names) which has the same shape and placement and even an analogous function in the structure of the cantillations. There is another accent, only in the poetic system, called the TSINNORIT (a diminutive of tsinor), which occurs directly above its letter, and is (almost?) never on the last letter of its word. (More modern printing tends to put the zarqa right on top of its letter too, but that's just a printing preference). If you look closely at some old manuscripts, you can tell that tsinnorit has a slightly different shape than zarqa/tsinor.

As encoded in Unicode, there are ZARQA (U+0598) and ZINOR (U+05AE) [sic]. By the usual meanings of those names, those should properly be synonyms, the same accent, but they're not. While the word"zinor" would be mnemonic of "tsinnorit," it's the wrong way around in the character names: ZINOR has the combining class of above-postposed, and ZARQA is encoded to go directly above the letter. So, to encode a zarqa or a tsinor, you need to use ZINOR, and to encode a tsinnorit, you need to use ZARQA.

Acknowledgements

Thanks to John Hudson, James Kass, KAWABATA Taichi, Ken Lunde, Robin Leroy, Marc Lodewijck, Artur Q.A., Mark Shoulson, and Andrew West for their contributions.

Modifications

The following summarizes modifications from the previous version of this document.

Revision 8

Revision 7

Revision 6

Revision 4

Revision 3

Revision 2

Revision 1


© 2006–2024 Asmus Freytag, Rick McGowan, Ken Whistler. This publication is protected by copyright, and permission must be obtained from the author and Unicode, Inc. prior to any reproduction, modification, or other use not permitted by the Terms of Use.

Use of this publication is governed by the UnicodeTerms of Use. The authors, contributors, and publishers have taken care in the preparation of this publication, but make no express or implied representation or warranty of any kind and assume no responsibility or liability for errors or omissions or for consequential or incidental damages that may arise therefrom. This publication is provided “AS-IS” without charge as a convenience to users.

Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries.


[8]ページ先頭

©2009-2025 Movatter.jp