Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Chinese character description languages

From Wikipedia, the free encyclopedia
Schemas to decompose Chinese characters

Several systems have been proposed for describing the internal structure ofChinese characters, including their strokes, components, and thestroke order, and the location of each in the character's ideal square. This information is useful for identifying variants of characters that are unified into one code point byUnicode andISO/IEC 10646, as well as to provide an alternative form of representation for rare characters that do not yet have a standardized encoding in Unicode. Many aim to work forregular script, as well as to provide the character's internal structure which can be used for easier look-up of a character by indexing the character's internal make-up and cross-referencing among similar characters.

CDL

[edit]

Character Description Language (CDL) is an XML-baseddeclarative language co-created by Tom Bishop and Richard Cook for theWenlin Institute. It defines characters by the arrangement of components, which are not required to reflect the semantic or etymological history of the character. In order for a component to fit into the allotted portion of the whole character's square, a set of fewer than 50 strokes allows one to construct approximately 1,000 components, which may in turn describe tens of thousands of characters.[1]

Ideographic Description Sequences

[edit]
Main article:Ideographic Description Characters (Unicode block)

Chapter 18 ofThe Unicode Standard (version 15.0) defines the "Ideographic Description Sequences" (IDS) syntax used to describe characters in featural terms, by arrangements of components with code points. Sixteen special characters in the range U+2FF0..U+2FFF act as prefix operators to combine other characters or sequences to form larger characters.

Ideographic Description Characters in Unicode
CharacterUnicode Character NumberFull Unicode Name
U+2FF0Ideographic description characterleft to right
U+2FF1Ideographic description characterabove to below
U+2FF2Ideographic description characterleft to middle and right
U+2FF3Ideographic description characterabove to middle and below
U+2FF4Ideographic description characterfull surround
U+2FF5Ideographic description charactersurround from above
U+2FF6Ideographic description charactersurround from below
U+2FF7Ideographic description charactersurround from left
U+2FFCIdeographic description charactersurround from right
U+2FF8Ideographic description charactersurround from upper left
U+2FF9Ideographic description charactersurround from upper right
U+2FFAIdeographic description charactersurround from lower left
U+2FFDIdeographic description charactersurround from lower right
U+2FFBIdeographic description characteroverlaid
U+2FFEIdeographic description characterhorizontal reflection
⿿U+2FFFIdeographic description characterrotation

Two additional ideographic description characters are scattered in other Unicode blocks.U+303E IDEOGRAPHIC VARIATION INDICATOR is not officially an ideographic description character, but is sometimes used in ideographic description sequences.

Other Ideographic Description Characters in Unicode
CharacterCode pointBlockName
U+303ECJK Symbols and PunctuationIdeographic variation indicator
U+31EFCJK StrokesIdeographic description charactersubtraction

These sequences are useful in describing to the reader a character that is not directly printable, either because it is absent in a given font, or is absent from the Unicode standard altogether. For example, thesawndip character𭨡 encoded inCJK Unified Ideographs Extension F as U+2DA21𭨡 can be described as⿰書史. Another use is for dictionary lookup purposes, as a roughinput method for queries.

These sequences can be rendered either by keeping the individual characters separately or by parsing the Ideographic Description Sequence and drawing the ideograph so described. They do not, by themselves, provide unambiguous rendering for all characters. For instance, the sequence⿱十一 represents both'EARTH' with the middle bar being narrower, and'SCHOLAR' with the middle bar being wider.

Unicode's specification for these sequences is based on the characters and syntax of the earlierGBK encoding. Additional symbols are later encoded to fill in the missing combinations.

The IDSgrep free software package by Matthew Skala[2][3] extends Unicode's IDS syntax to include additional features for dictionary lookup; it is capable of converting KanjiVG's database to its own extended IDS format, or of searching EIDS files generated by the related Tsukurimashou font family.

See also

[edit]

References

[edit]

Citations

[edit]
  1. ^Bishop & Cook (2003c), pp. 2, 9.
  2. ^"IDSgrep",Tsukurimashou Project, 2024,archived from the original on Feb 7, 2024
  3. ^Skala, Matthew (2015),"A Structural Query System for Han Characters"(PDF),International Journal of Asian Language Processing, vol. 23, no. 2, pp. 127–159,arXiv:1404.5585, archived fromthe original(PDF) on 2016-03-04, retrieved2016-01-13

Works cited

[edit]
Retrieved from "https://en.wikipedia.org/w/index.php?title=Chinese_character_description_languages&oldid=1331098533"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp