Movatterモバイル変換

[0]ホーム

Jump to content

Numerals in Unicode

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromⅥ)

Graphemes for various number systems

Anumeral (often callednumber inUnicode) is a character that denotes a number. Thedecimal number digits0–9 are used widely in various writing systems throughout the world, however thegraphemes representing the decimal digits differ widely. Therefore Unicode includes 22 different sets of graphemes for the decimal digits, and also various decimal points, thousands separators, negative signs, etc. Unicode also includes severalnon-decimal numerals such asAegean numerals,Roman numerals,counting rod numerals,Mayan numerals,Cuneiform numerals andancient Greek numerals. There is also a large number of typographical variations of theWestern Arabic numerals provided for specialized mathematical use and for compatibility with earlier character sets, such as² or ②, and composite characters such as ½.

Numerals by numeric property

[edit]

Grouped by their numerical property as used in a text, Unicode has four values for Numeric Type. First there is the "not a number" type. Then there aredecimal-radix numbers, commonly used in Western style decimals (plain 0–9), there are numbers that are not part of a decimal system such as Roman numbers, and decimal numbers in typographic context, such as encircled numbers. Not noted is a numbering like "A. B. C." for chapter numbering.

v t e Numeric Type^[a]^[b] (Unicode character property)
Numeric type	Code	Has numeric value	Example	Remarks
Not numeric	`<none>`	No	A X (Latin) ! Д μ に	Numeric Value="NaN"
Decimal	`De`	Yes	0 1 9 ६ (Devanagari 6) ೬ (Kannada 6) 𝟨 (Mathematical, styled sans serif)	Straight digit (decimal-radix). Corresponds both ways withGeneral Category=Nd^[a]
Digit	`Di`	Yes	¹ (superscript) ① ⒈ (digit with full stop)	Decimal, but in typographic context
Numeric	`Nu`	Yes	¾ ௰ (Tamil number ten) Ⅹ (Roman numeral) 六 (Han number 6)	Numeric value, but not decimal-radix
a.^"Section 4.6: Numeric Value".The Unicode Standard. Unicode Consortium. September 2025.
b.^"Unicode Derived Numeric Types".Unicode Character Database. Unicode Consortium. 2025-06-30.

Hexadecimal digits

[edit]

Hexadecimal digits in Unicode are not separate characters; existing letters and numbers are used. These characters have markedCharacter propertiesHex_digit=Yes, andASCII_Hex_digit=Yes when appropriate.

Characters in Unicode marked`Hex_Digit=Yes`^[a]
`0123456789ABCDEF`	Basic Latin, capitals	Also`ASCII_Hex_Digit=Yes`
`0123456789abcdef`	Basic Latin, small letters	Also`ASCII_Hex_Digit=Yes`
`０１２３４５６７８９ＡＢＣＤＥＦ`	Fullwidth forms, capitals
`０１２３４５６７８９ａｂｃｄｅｆ`	Fullwidth forms, small letters
a.^"Unicode 17.0 UCD: PropList.txt". 2025-06-30. Retrieved2025-09-11.

Numerals by script

[edit]

Hindu–Arabic numerals

[edit]

TheHindu–Arabic numeral system involves ten digits representing 0–9. Unicode includes theWestern Arabic numerals in the Basic Latin (or ASCII derived) block. The digits are repeated in several other scripts:Eastern Arabic, Balinese, Bengali, Devanagari, Ethiopic, Gujarati, Gurmukhi, Telugu, Khmer, Lao, Limbu, Malayalam, Mongolian, Myanmar, New Tai Lue, Nko, Oriya, Telugu, Thai, Tibetan, Osmanya. Unicode includes a numeric value property for each digit to assist in collation and other text processing operations. However, there is no mapping between the various related digits.

Although Arabic is written from right to left, while English is written left to right, in both languages numbers are written with the most significant digit on the left and the least significant on the right.

Fractions

[edit]

TheU+2044 ⁄FRACTION SLASH allows authors using Unicode to compose any arbitrary fraction along with the decimal digits. This was intended to instruct font rendering to make the surrounding digits smaller and raise them on the left and lower them on the right, but this is rarely implemented. (A workaround is to use the super/subscript characters described below, but only Arabic numerals are available.) Unicode also includes a handful ofvulgar fractions as compatibility characters, but discourages their use.

Decimal fractions

[edit]

Several characters in Unicode can serve as a decimal separator depending on the locale. Decimal fractions are represented in text as a sequence of decimal digit numerals with a decimal separator separating the whole-number portion from the fractional portion. For example, the decimal fraction for ¼ is expressed as zero-point-two-five ("0.25"). Unicode has no dedicated general decimal separator but unifies the decimal separator function with other punctuation characters. So the "." used in "0.25" is the same period character (U+002E) used to end the sentence. However, cultures vary in the glyph or grapheme used for a decimal separator. So in some locales, the comma (U+002C) may be used instead: "0,25". Still other locales use a space (or non-breaking space) for "0 25". The Arabic writing system includes a dedicated character for a decimal separator that looks much like a comma,U+066B ٫ARABIC DECIMAL SEPARATOR, which when combined with the Arabic digits to express one-quarter appears as: "٠٫٢٥".

Characters for mathematical constants

[edit]

Currently, three Unicode characters semantically represent mathematical constants:U+210E ℎPLANCK CONSTANT, theU+210F ℏPLANCK CONSTANT OVER TWO PI, andU+2107 ℇEULER CONSTANT (of unknown significance^[1]). Other mathematical constants can be represented using characters that have multiple semantic uses. For example, although Unicode includes a character fornatural exponent ℯ (U+212F) its UCS canonical name derives from its glyph:U+212F ℯSCRIPT SMALL E; and the mathematical constantπ, 3.141592.., is represented byU+03C0 πGREEK SMALL LETTER PI.

Rich text and other compatibility numerals

[edit]

The Western Arabic numerals also appear among the compatibility characters as rich text variant forms including bold, double-struck, monospace, sans-serif and sans-serif bold, along with fullwidth variants for legacy vertical text support.

Rich text parenthesized, circled and other variants are also included in the blocksEnclosed CJK Letters and Months;Enclosed Alphanumerics;Enclosed Alphanumeric Supplement;Superscripts and Subscripts;Number Forms; andDingbats.

Suzhou (huāmǎ/Sūzhōu mǎzi) numerals

[edit]

Main articles:Suzhou numerals andChinese numerals

Thehuāmǎ (simplified Chinese:花码;traditional Chinese:花碼)/Sūzhōu mǎzi (simplified Chinese:苏州码子;traditional Chinese:蘇州碼字) system is a variation of the rod numeral system. Rod numerals are closely related to thecounting rods and theabacus, which is why the numeric symbols for 1, 2, 3, 6, 7 and 8 in thehuāmǎ system are represented in a similar way as on the abacus. Nowadays, thehuāmǎ system is only used for displaying prices in Chinese markets or on traditional handwritten invoices.

The digits of the Suzhou numerals are in theCJK Symbols and Punctuation block at U+3021—U+3029, U+3007, U+5341, U+5344, and U+5345. In Unicode 3.0 these characters are incorrectly calledHangzhou style numerals. In the Unicode 4.0, an erratum was added which stated:^[2]

The Suzhou numerals (Chinesesu1zhou1ma3zi) are special numeric forms used by traders to display the prices of goods. The use of "HANGZHOU" in the names is a misnomer.

All references to "Hangzhou" in the Unicode standard have been corrected to "Suzhou" except for the character names themselves, which cannot be changed once assigned, according to the Unicode Stability Policy.^[3] (This policy allows software to use the names as unique identifiers.)

Japanese and Korean numerals

[edit]

Main articles:Japanese numerals andKorean numerals

Ancient Greek numerals

[edit]

Unicode provides support for several variants ofGreek numerals, assigned to theSupplementary Multilingual Plane from U+10140 through U+1018F.^[4]

Main article:Attic numerals

Attic numerals were used byancient Greeks, possibly from the7th century BC. They were also known asHerodianic numerals because they were first described in a 2nd-century manuscript byHerodian. They are also known asacrophonic numerals because all of the symbols used derive from the first letters of the words that the symbols represent: 'one', 'five', 'ten', 'hundred', 'thousand' and 'ten thousand'. SeeGreek numerals andacrophony.

Decimal	Symbol	Greek numeral
1	Ι	ἴος orἰός (ios)
5	Π	πέντε ('pente)
10	Δ	δέκα (deka)
100	Η	ἑκατόν ('hekaton)
1000	Χ	χίλιοι (khilioi)
10000	Μ	μύριοι (myrioi)

Ancient Greek Numbers^[1]^[2]
Official Unicode Consortium code chart (PDF)

U+1014x

𐅀

𐅁

𐅂

𐅃

𐅄

𐅅

𐅆

𐅇

𐅈

𐅉

𐅊

𐅋

𐅌

𐅍

𐅎

𐅏

U+1015x

𐅐

𐅑

𐅒

𐅓

𐅔

𐅕

𐅖

𐅗

𐅘

𐅙

𐅚

𐅛

𐅜

𐅝

𐅞

𐅟

U+1016x

𐅠

𐅡

𐅢

𐅣

𐅤

𐅥

𐅦

𐅧

𐅨

𐅩

𐅪

𐅫

𐅬

𐅭

𐅮

𐅯

U+1017x

𐅰

𐅱

𐅲

𐅳

𐅴

𐅵

𐅶

𐅷

𐅸

𐅹

𐅺

𐅻

𐅼

𐅽

𐅾

𐅿

U+1018x

𐆀

𐆁

𐆂

𐆃

𐆄

𐆅

𐆆

𐆇

𐆈

𐆉

𐆊

𐆋

𐆌

𐆍

𐆎

Notes

1.^ As of Unicode version 17.0

2.^ Grey area indicates non-assigned code point

Roman numerals

[edit]

Roman numerals originated in ancientRome, adapted fromEtruscan numerals. The system used inclassical antiquity was slightly modified in theMiddle Ages to produce the system we use today. It is based on certain letters which are given values as numerals.

Roman numerals are commonly used today in numbered lists (in outline format), clockfaces, pages preceding the main body of a book, chord triads in music analysis (Roman numeral analysis), the numbering of movie and video game sequels, book publication dates, successive political leaders or children with identical names, and the numbering of some sport events, such as theOlympic Games or theSuper Bowl.

Unicode has a number of characters specifically designated as Roman numerals, as part of theNumber Forms^[5] range from U+2160 to U+2188. This range includes both upper- and lowercase numerals, as well as pre-combined characters for numbers up to 12 (XII). One reason for the existence of pre-combined numbers is to facilitate the setting of multiple-letter numbers (such as VIII) on a single horizontal line in Asian vertical text. The Unicode standard, however, includes special Roman numeral code points for compatibility only, stating that "[f]or most purposes, it is preferable to compose the Roman numerals from sequences of the appropriate Latin letters".^[6]

Additionally, characters exist for archaic^[5] forms of 1000, 5000, 10,000,large reversed C (Ɔ), late 6 (ↅ, similar to GreekStigma: Ϛ), early 50 (ↆ, similar to down arrow ↓⫝⊥^[7]), 50,000, and 100,000. The small reversed c, ↄ, is not intended to be used in Roman numerals, but aslower case Claudian letter Ↄ.

Table of Roman numerals in Unicode
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
Value^[8]	1	2	3	4	5	6	7	8	9	10	11	12	50	100	500	1,000
U+216x	Ⅰ	Ⅱ	Ⅲ	Ⅳ	Ⅴ	Ⅵ	Ⅶ	Ⅷ	Ⅸ	Ⅹ	Ⅺ	Ⅻ	Ⅼ	Ⅽ	Ⅾ	Ⅿ
U+217x	ⅰ	ⅱ	ⅲ	ⅳ	ⅴ	ⅵ	ⅶ	ⅷ	ⅸ	ⅹ	ⅺ	ⅻ	ⅼ	ⅽ	ⅾ	ⅿ

	0	1	2	3	4	5	6	7	8
Value	1000	5000	10,000	100	100	6	50	50,000	100,000
U+218x	ↀ	ↁ	ↂ	Ↄ	ↄ	ↅ	ↆ	ↇ	ↈ

If usingblackletter orscript typefaces, Roman numerals are set inRoman type. Such typefaces may contain Roman numerals matching the style of the typeface in the Unicode range U+2160–217F; if they don't exist, a matchingAntiqua typeface is used for Roman numerals.

Unicode has characters forRoman fractions in theAncient Symbols^[9] block: sextans, uncia, semuncia, sextula, dimidia sextula, siliqua, and as.

Counting rod numerals

[edit]

Main article:Counting rods

Counting rod numerals are included in their own block in theSupplementary Multilingual Plane (SMP) as of Unicode 5.0. There are nine "horizontal" digits (U+1D360 to U+1D368) and nine "vertical" digits (U+1D369 to U+1D371), the horizontal digits are used for odd powers of ten and the vertical digits for even powers of ten. Zero should be represented by U+3007 (〇, ideographic number zero) and the negative sign should be represented by U+20E5 (combining reverse solidus overlay).^[10] This block also contains other counting-rod-like symbols, such as the well-known tally mark for 5~~||||~~. As these were recently added to the character set and are not in the BMP, font support may still be limited.

Counting Rod Numerals^[1]^[2]
Official Unicode Consortium code chart (PDF)

U+1D36x

𝍠

𝍡

𝍢

𝍣

𝍤

𝍥

𝍦

𝍧

𝍨

𝍩

𝍪

𝍫

𝍬

𝍭

𝍮

𝍯

U+1D37x

𝍰

𝍱

𝍲

𝍳

𝍴

𝍵

𝍶

𝍷

𝍸

Notes

1.^ As of Unicode version 17.0

2.^ Grey areas indicate non-assigned code points

References

[edit]

^It is unknown which constant this is supposed to be.Xerox standard XCCS 353/046 just says "Euler's."
^Freytag, Asmus; Rick McGowan; Ken Whistler (2006-05-08)."UTN #27: Known anomalies in Unicode Character Names".Technical Notes. Unicode Consortium. Retrieved2008-06-13.
^"Name Stability".Unicode Character Encoding Stability Policy. Unicode Consortium. 2008-02-28. Retrieved2008-06-13.
^Unicode Charts: Ancient Greek Numbers
^^a ^bUnicode Number Forms
^The Unicode Standard, Version 6.0 – Electronic edition(PDF), Unicode, Inc., 2011, p. 486
^David J. Perry: Proposal to Add Additional Ancient Roman Characters to UCS
^For the first two rows
^Unicode Ancient Symbols
^The Unicode Standard, Version 5.0 – Electronic edition(PDF), Unicode, Inc., 2006, pp. 499–500

Unicode

Code points

Characters

Special purpose	BOM Combining grapheme joiner Left-to-right mark /Right-to-left mark Soft hyphen Variant form Word joiner Zero-width joiner Zero-width non-joiner Zero-width space
Lists	Characters CJK Unified Ideographs Combining character Duplicate characters Numerals Scripts Spaces Symbols Halfwidth and fullwidth Alias names and abbreviations Whitespace characters

Processing

Algorithms	Bidirectional text Collation ISO/IEC 14651 Equivalence Variation sequences International Ideographs Core
Comparison of encodings	BOCU-1 CESU-8 Punycode SCSU UTF-1 UTF-7 UTF-8 UTF-16/UCS-2 UTF-32/UCS-4 UTF-EBCDIC

On pairs of
code points

Usage

Related standards

Movatterモバイル変換

Numerals by numeric property

Hexadecimal digits

Numerals by script

Hindu–Arabic numerals

Fractions

Decimal fractions

Characters for mathematical constants

Rich text and other compatibility numerals

Suzhou (huāmǎ/Sūzhōu mǎzi) numerals

Japanese and Korean numerals

Ancient Greek numerals

Roman numerals

Counting rod numerals

See also

References