A static site to link people to when their code is displaying Japanese wrong.
View the Project on GitHubheistak/your-code-displays-japanese-wrong
If someone gave you a link to this page, that person probably thinks your code displays Japanese wrong. In short, from a native Japanese eye, yѳur ҭєxҭ lѳѳκs κιnd ѳf lικє ҭЋιs. This page will give you a brief description of the glyph appearance problems that often arise with implementations of Asian text display, why it happens, why it’s a big deal, and how to fix it.
Kanji, also known as Hanzi, Hanja or just Han Characters, is a set of characters that originated in China but are also used in Japan, Korea, Taiwan, etc. The Kanji sets used in those countries each look mostly similar to each other, but also have large numbers of characters that have different-looking glyphs. (Glyph is a typographical term which refers to the appearance of a character, as opposed to the meaning.)
For instance, here are the Japanese, Simplified Chinese, and Traditional Chinese glyph variants of the character that representsknife edge:
Language | Glyph | Unicode Code Point |
---|---|---|
Japanese | ![]() | U+5203 |
Simplified Chinese | ![]() | U+5203 |
Traditional Chinese | ![]() | U+5203 |
Therefore, if text in Japanese is displayed using a Kanji glyph set meant for other languages, it will look to a native Japanese reader as non-native, vaguely shady, and plain bizarre due to the unfamiliar glyphs showing up in the text. This is most likely what’s happening with your program.
Back when Unicode was being designed, a decision calledHan Unification was made to create a single unified set of all the Chinese (Simplified/Traditional), Japanese, and Korean Kanji characters. This involved giving equivalent code points to characters that were deemed equivalent across languages, which allowed the size of the character set to be kept small.
However, this also meant that characters which differ in appearance across languages, such as刃 and刃 and刃, were givenidentical code points! You can see in the earlier chart that the three “knife edges” were ALL assigned U+5203. It is up to the program displaying the text to render them using a font that can display the correct glyph set.
In many cases, the default fallback behavior in an ambiguous situation is to choose the Simplified Chinese glyph set. Therefore, if the developer isn’t aware of it, Japanese text tends to be incorrectly displayed using Chinese glyphs.
The reason nobody had reported this issue to you is most likely becausethe people most impacted by this are not speakers of English!
And yes, since the app is not exactly unreadable in this state, it may be tempting to consider this issue minor and give it low priority. However, this issue is much more than the difference between, say, the lowercase A with the overhang (a) or without (α). Like the example at the beginning of this article, if the equivalent symptom was happening with English text, ιҭ wѳuld bє lѳѳκιng sѳmєҭЋιng lικє ҭЋιs.
Much like how the previous sentence immediately jumps out at you as appearingweird andwrong, Japanese text written in incorrect glyph sets will stand out similarly to any native speaker of Japanese, and will give off a connotation that whoever developed this app does not care about this (often large) subset of the global user population. I hope you agree in that this apathy is not the message you want to be sending.
Here are some characters that are known to have different glyph appearances between different languages.
刃直海角骨入
Try copy-pasting them into your code, see the rendered results, and compare them with below. If the glyph shapes look different from the Japanese result sample below (aside from differences due to the font’s styling), your code is displaying Japanese wrong.
In a nutshell, the way to fix it is to make your code and font be aware that it’s displaying Japanese when it is doing so.
On the web, browser rendering engines are usually smart enough to choose the correct fonts from generic font family declarations likefont-family: sans-serif
. However, it may choose a wrong font if thelang
orxml:lang
property of your DOM elements are not specified toja
. Make sure that when you switch the output language of your pages to Japanese, thelang
property also changes toja
.
Also, if explicitly specifying fonts in CSS, be sure to specify a font that is designed for the language. The followingfont-family
statement covers most standard Japanese fonts preinstalled in modern devices (courtesy ofICS Media):
body { font-family: "Helvetica Neue", Arial, "Hiragino Kaku Gothic ProN", "Hiragino Sans", Meiryo, sans-serif;}
Games often store and display fonts using a system that generates font texture atlases from a font file, such as Unity’sTextMesh Pro.
If you are using such a system, make sure you are generating separate font atlases for each Asian language, and that each of the source fonts used to generate them are specifically designed for that language.Google’s Noto project provides great open-licensed fonts specifically designed forJapanese,Simplified Chinese,Traditional Chinese,Korean, etc.
A few other things.I broke them off to a separate page.
The author is a Japanese native who only speaks English/Japanese, and made this site out of a personal peeve. I don’t have much insight on other languages, sorry. If you can offer assistance in problems that happen with other languages/environments, or find any mistakes, please drop me a line.
Kenji Iguchi