Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Mojikyō

From Wikipedia, the free encyclopedia
Character encoding scheme
Mojikyō
Konjaku Mojikyō
今昔文字鏡
TheMojikyō character map highlighting theTaiwanese kana [note 1]
DevelopersTadahisa Ishikawa
(石川忠久)
Tokio Furuya
(古家時雄)
Mojikyō Institute
(文字鏡研究会)
Initial release1.0 / July 1997; 28 years ago (1997-07)
Final release
4.0 / 15 December 2018; 6 years ago (2018-12-15)
Operating systemMicrosoft Windows
Size51MB
Available inJapanese
TypeCharacter set bundled withfonts and acharacter map
LicenseProprietary
Websitemojikyo.org

Mojikyō (Japanese:文字鏡), also known by its full nameKonjaku Mojikyō (今昔文字鏡,lit.'(the) past and present character mirror'), is acharacter encoding scheme created to provide a complete index of characters used in theChinese,Japanese,Korean,VietnameseChữ Nôm and other historical Chineselogographic writing systems. The Mojikyō Institute (文字鏡研究会,Mojikyō Kenkyūkai), which published the character set, also publishedcomputer software and TrueTypecomputer fonts to accompany it. The Mojikyō Institute, chaired by Tadahisa Ishikawa (石川忠久),[1] originally had its character set and related software and data redistributed onCD-ROMs sold inKinokuniya stores.[2]

Conceptualized in 1996,[3] the first version of the CD-ROM was released in July 1997.[4] For a time, the Mojikyō Institute also offered a web subscription, termed "Mojikyō WEB" (文字鏡WEB), which had more up-to-date characters.[5]

As of September 2006[update],Mojikyō encoded 174,975 characters.[6] Among those, 150,366 characters (≈86%) then belonged to the extendedChinese–Japanese–Korean–Vietnamese (CJKV)[note 2] family.[5] Many ofMojikyō's characters are considered obsolete or obscure, and are not encoded by any other character set, including the most widely used international text encoding standard,Unicode.

Originally a paid proprietary software product, as of 2015, the Mojikyō Institute began to upload its latest releases toInternet Archive asfreeware,[7] as amemorial to honor one of its developers, Tokio Furuya (古家時雄), who died that year.[3] On 15 December 2018, version 4.0 was released. The next day, Ishikawa announced that without Furuya this would be the final release ofMojikyō.[3]

Premise

[edit]

TheMojikyō encoding was created to provide a complete index of characters used in theChinese,Japanese,Korean writing systems andVietnameseChữ Nômlogographic scripts. It also encodes a large number of characters in ancient scripts, such as theoracle bone script, theseal script, andSanskrit (Siddhaṃ). For many characters, it is the onlycharacter encoding to encode them, and its data is often used as a starting point forUnicode proposals.[8][9] However,Mojikyō has much looser standards than Unicode for encoding, which leadsMojikyō to have many encoded glyphs of dubious, or even unintentionally fictional, origin.[10][11] As such, while many non-UnicodeMojikyō characters are suitable for addition to Unicode, not all can become Unicode characters, due to the differingstandards of evidence required by each.

Composition

[edit]

TheMojikyō fonts (文字鏡フォント) areTrueType fonts that come in aZIP file and are each around 2–5megabytes; the different fonts contain different numbers of characters.[note 3] Also included is aWindows executable that implements agraphicalcharacter map, the "Mojikyō Character Map" (文字鏡MAP),MOCHRMAP.EXE.[note 4][note 5]MOCHRMAP.EXE allows users to browse through theMojikyō fonts, and copy and paste characters in lieu of typing them on the keyboard. As opposed to the regular Windows character map, or for that matterKCharSelect, which both support TrueType fonts,MOCHRMAP.EXE displays the numberedMojikyō encoding slot of the requested character.[12][note 6] In order forMOCHRMAP.EXE to work, allMojikyō fonts must be installed.[note 7]

Encoding

[edit]

When referring to a character encoded inMojikyō, the format MXXXXXX is often used,[13] similar to the U+XXXX format used for Unicode. A difference, however, is thatMojikyō encodings displayed this way aredecimal, while Unicode's U+ encoding ishexadecimal.

From the earliest days of Unicode,Mojikyō has both influenced—and been influenced by—the standard. Glyphs originating fromMojikyō first appear in a proposal to theIdeographic Rapporteur Group (IRG),[note 8] which is responsible for maintaining all CJK blocks in Unicode,[14][15] on 18 April 2002.[16] In May 2007,Mojikyō played a minor role in an eventually successful series of proposals to encode theTangut script in Unicode;[17][note 9]Mojikyō already had within its encoding 6,000 Tangut characters by October 2002.[6]

The Unicode Standard's Unihan Database refers toMojikyō as the "JapaneseKOKUJI Collection" (日本国字集),[18] abbreviated "JK".[19][20] For example,U+2B679 𫙹CJK UNIFIED IDEOGRAPH-2B679,[note 10] an ideographread in Japanese asburizādo (ブリザード,lit.'blizzard'), has a J-Source[note 11] equal to JK-66038. All Unicode characters with a JK-prefixed J-Source originate fromMojikyō.[21][note 12] According toKen Lunde, a subject matter expert in character encodings andEast Asian languages, as of Unicode 13.0, 782 ideographs in Unicode originate fromMojikyō, split somewhat evenly between twoblocks:CJK Unified Ideographs Extension C, with 367, andCJK Unified Ideographs Extension E, with 415.[20][22] Not all Unicode characters withMojikyō origins (JK-prefixed J-Sources) have the samerepresentative glyph in thecode chart as in theMojikyō font;[note 13] some characters had their shapes changed before final encoding, as investigation showed the shapes assigned by the Mojikyō Institute were wrong.[11][note 14]

Blocks

[edit]

As of September 2006[update] it encoded 174,975 characters.[6] Among those, 150,366 characters then belonged to the extendedCJKV[note 2] family.[5] Many of the encoded characters are considered obsolete or otherwise obscure, and are not encoded by any other character set, including the international standard, Unicode. EachMojikyō character has a unique number, and the characters are organized into blocks.

Mojikyō puts CJKV characters in different blocks according to their traditionalKangxi radical. Common radicals containing an especially high number of characters, such asRadicals 9 () and162 (), are split further by stroke order.[note 15]

No unification

[edit]

Unlike Unicode,Mojikyō purposely avoidsHan unification; no attempt at compactness of the encoding is made, nor is there an attempt to keep all common characters below U+FFFF as there is in Unicode.[citation needed]

Unicode, on the other hand, sorts its CJK into blocks based on how common they are: the most common are generally put into theBasic Multilingual Plane,[note 14] while those that are rare or obscure are put into theSupplementary Planes.[citation needed]

License

[edit]

Mojikyō isproprietary software under a restrictive license. Originally, the Mojikyō Institute tried to prevent its character data from being used, and threatened those who published conversion tables to and from its character set. In July 2010, the Mojikyō Institute abandoned its legal efforts to stop at least one Japanese user from publishing conversion tables or converting characters encoded inMojikyō to Unicode or other character sets.[23] Mere data, sometimes including the shapes of letters, are considered in many jurisdictions to becommon property as they do not meet thethreshold of originality.[note 16]

Due to this legacy, however,GlyphWiki [ja] disallowedMojikyō data as of 2020.[24]

Collected writing systems

[edit]

Living

[edit]

Dead or obsolete

[edit]

See also

[edit]

References

[edit]
  1. ^"今昔文字鏡について" [About Mojikyō].Mojikyō Institute (in Japanese). Archived fromthe original on 3 February 2001. Retrieved6 July 2020.
  2. ^ようこそ、今昔文字鏡の世界へ! [Welcome to the world ofMojikyō!] (in Japanese).Kinokuniya KK. Archived fromthe original on 4 March 2005. Retrieved5 July 2020.
  3. ^abcIshikawa, Tadahisa (August 2015)."古家時雄君を悼む" [Tokio Furuya, we grieve your death].Mojikyō Institute (in Japanese). Retrieved8 July 2020.
  4. ^Konjaku Mojikyō今昔文字鏡 (in Japanese), July 1997,ISBN 9784314900034
  5. ^abc今昔文字鏡とは [About Mojikyo] (in Japanese).Kinokuniya KK. Archived fromthe original on 27 April 2010. Retrieved5 July 2020.
  6. ^abc今昔文字鏡とは [What isMojikyō?] (in Japanese).Kinokuniya KK. Archived fromthe original on 5 February 2005. Retrieved5 July 2020.
  7. ^"Search: creator:"MOJIKYO Institute"".Internet Archive. Retrieved6 July 2020.
  8. ^Takada, Tomokazu; Yada, Tsutomu; Saito, Tatsuya (18 September 2015).Proposal for hentaigana(PDF). Translated byKobayashi, Tatsuo; Kobayashi, Daniel.Information Processing Society of Japan. L2/15-239. Retrieved5 July 2020 – viaUnicode Consortium.
  9. ^Hiura, Hideki;Kobayashi, Tatsuo; et al. (31 October 2003).Ideograph Variation Selector and Variation Collection Identifier. Open Internationalization Initiative. L2/03-413. Retrieved5 July 2020 – viaUnicode Consortium.
  10. ^Takada, Tomokazu [高田智和]; Oda, Tetsuji [織田哲治]; et al. (26 August 2013).平成25年度第3回文字情報検討サブワーキンググループ議事録 [Meeting Minutes of the Third Character Information Examination Sub-Working Group of 2013 (Heisei 25)](PDF).Information Technology Promotion Agency,Government of Japan (in Japanese). p. 2. Retrieved6 July 2020.文字鏡研究会の関係者にヒアリングしたところ、オランダから提案されたWG2 N36981には文字鏡のフォントが使用されているが、文字鏡研究会は関与しておらず、提案内容についても疑問があるとのことであった。[According to an interview with a representative of the Mojikyō Institute, aMojikyō font is used in WG2 N36981 proposed by the Netherlands, but the Mojikyō Institute itself is not involved with the proposal; it furthermore has doubts about some of the content of that proposal.]
  11. ^abSuzuki, Toshiya [鈴木俊哉] (30 July 2009).統合漢字に申請された「殷周金文集成引得」図形文字の調査 [Investigation on Glyphs collected from "Index to Collection of Inscriptions of the Yin-Zhou Period" to submit to CJK Unified Ideographs].IPSJ SIG Technical Report (in Japanese). 2009-DD-72 (7).Information Processing Society of Japan: 2 – viaInternet Archive.しかし、拡張Cの標準化作業が8年の長期にわたり、また事後的に用例が必須とされたため、正式に公布された拡張C漢字の典拠は当初の典拠とはかなり異なるものとなっている。たとえば日本では当初は文字鏡研究会によって選定された1000文字程度の漢字を申請していた[。] [...] 典拠用例確認は文字鏡とは独立に行なわれたため、字形が文字鏡漢字から変更されたものも多い。[As the standardization effort forCJK Unified Ideographs Extension C has been eight long years in the making and examples of kanji have been requested after their encoding, the officially promulgated Extension C kanji standard is quite different from the original standard. For example, we, the Government of Japan, initially applied for about 1,000 kanji selected by the Mojikyō Institute[.] [...] Since the verification of the kanji was performed independently of the Mojikyō Institute, the character shapes were often changed from Mojikyō's version of that same codepoint.]
  12. ^Ishikawa, Tadahisa (25 May 1999)."パソコン悠悠漢字術 今昔文字鏡徹底活用" [Kanji on your PC, Made Easy—The Complete Mojikyō Manual]. Mojikyō Institute. Retrieved6 July 2020.
  13. ^West, Andrew; Chan, Eiso (1 June 2018)."Table 5: Comparative Table of Shuishu Characters from All Sources"(PDF).Analysis of Shuishu character repertoire. pp. 21–212.ISO/IEC JTC1/SC2/WG2 N4956;UTC L2/18-193.
  14. ^"Unicode Standard Annex #45: U-source Ideographs".The Unicode Standard. Unicode Consortium.
  15. ^"Appendix E: Han Unification History"(PDF).The Unicode Standard. Unicode Consortium. March 2020.
  16. ^"CJK Extension C1 From Japan".Ideographic Rapporteur Group. IRG#19 N895 – viaThe Chinese University of Hong Kong's Department of Computer Science and Engineering.N895-Japan_C1
  17. ^Cook, Richard (9 May 2007).Proposal to encode Tangut characters in UCS Plane 1(PDF).UC Berkeley Script Encoding Initiative. p. 4. L2/07-143 – viaUnicode Consortium.
  18. ^Jenkins, John H.; Cook, Richard; Lunde, Ken, eds. (5 March 2020),"kIRG JSource",Unicode Standard Annex #38,Unicode Consortium
  19. ^Kobayashi, Tatsuo (3 December 2001)."List of Japanese Ideographs which may be proposed in Extension-C". ISO/IEC JTC1/SC2/WG2/IRG N853.
  20. ^abKen Lunde [@ken_lunde] (6 July 2020)."In particular, all 782 JK-prefixed ideographs are indeed from 今昔文字鏡 per IRG N862. Most were encoded in #ExtensionC, and the stragglers were encoded in #ExtensionE." (Tweet). Retrieved6 July 2020 – viaTwitter.
  21. ^Ken Lunde [@ken_lunde] (6 July 2020)."JK-prefixed J-Source ideographs came from 今昔文字鏡, which are in Extensions C and E (the mention of Extension D was simply that what became Extension E was originally targeted to become Extension D)" (Tweet).Archived from the original on 7 July 2020. Retrieved6 July 2020 – viaTwitter.
  22. ^Ken Lunde [@ken_lunde] (6 July 2020)."367 JK-prefixed ideographs are in Extension C, and the remaining 415 are in Extension E." (Tweet). Retrieved6 July 2020 – viaTwitter.
  23. ^"終戦宣言" [Announcement: The War is Over].青蛙亭漢語塾 [Seiwatei's KanjiCram School] (in Japanese) (28 January 2016 ed.). 21 July 2010. Retrieved7 July 2020.
  24. ^"データ・記事のライセンス" [License of our data and articles].GlyphWiki (9 June 2010 ed.). Retrieved6 July 2020.今昔文字鏡およびその関連製品、データは、そのライセンス上グリフウィキには用いることができません。文字鏡番号(独自部分)および文字鏡のフォントに収録されているグリフそのもの、およびそれを参照、利用して作成していると判断できる情報は、グリフウィキに登録する際の典拠とすることはできませんので、ご協力をお願いいたします。 [Konjaku Mojikyō and related products and associated data are licensed in such a way that they are incompatible with our above GlyphWiki license. Neither the number of theMojikyō encoding slot, nor the appearance of the glyph itself inMojikyō's fonts, nor any information whatsoever that can be judged to have been gathered by referring to aMojikyō product, can be used when entering data into GlyphWiki. We absolutely cannot acceptMojikyō data. Please cooperate with us.]

Notes

[edit]
  1. ^As yet, lacks aUnicode encoding, so is approximated here withCSS andU+30BB KATAKANA LETTER SE.
  2. ^abFor Korean,Hanja are referred to. For Vietnamese,Chữ Nôm.
  3. ^Download the fileMojikyoCmap400ALL49TTF.7z fromthe official website
  4. ^English name from the title of the window produced by running the executable; Japanese name from the icon of the executable.
  5. ^Also called the "Mojikyō Cmap".
  6. ^See the screenshots onthe official website
  7. ^Into the system fonts directoryC:\Windows\Fonts.
  8. ^As of 2019, the IRG rebranded as the Ideographic Research Group.
  9. ^The history of the encoding of the Tangut script is quite complicated, seeTangut (Unicode block) § History for a full listing of all the related proposals and a timeline.
  10. ^Ideographic Description Sequence:⿰魚嵐
  11. ^This is acolumn name in theUnihan database; ⟨J⟩ here is short for "Japanese glyph source". The full name of the column iskIRG_JSource. UnderHan unification, there are nine such sources. See §3.1 ofUAX#38 for a complete list and more information.
  12. ^Other J-Source prefixes exist, such as J4, meaning the character originates fromJIS X 0213:2004.
  13. ^That is to say, a glyph made up of the sameradicals in the same positions.
  14. ^abErrors in large collections of ideographs are, of course, not uncommon. Such errors even accidentally occur in well funded government-produced collections, such as the famouskanji from unknown sources in theJapanese Industrial Standards Committee'sJIS X 0208 double-byte character encoding standard. All of these JIS X 0208 error kanji (Ghost characters,幽霊文字; e.g.,) have made their way into Unicode despite not being "real" kanji.
  15. ^For proof, see the list in the Mojikyō Character Map,MOCHRMAP.EXE.
  16. ^See also:fictitious entry;trap street.

External links

[edit]
Early telecommunications
ISO/IEC 8859
Bibliographic use
National standards
ISO/IEC 2022
Mac OSCode pages
("scripts")
DOS code pages
IBM AIX code pages
Windows code pages
EBCDIC code pages
DEC terminals (VTx)
Platform specific
Unicode /ISO/IEC 10646
TeX typesetting system
Miscellaneous code pages
Control character
Related topics
Retrieved from "https://en.wikipedia.org/w/index.php?title=Mojikyō&oldid=1295308275"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp