Mojibake

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Mojibake" – news ·newspapers ·books ·scholar ·JSTOR(March 2023) (Learn how and when to remove this message)

Mojibake (Japanese:文字化け;IPA:[mod͡ʑibake], 'character transformation') is the garbled or gibberish text that is the result of text being decoded using an unintendedcharacter encoding.^[1] The result is a systematic replacement of symbols with completely unrelated ones, often from a differentwriting system.

TheUTF-8-encodedJapanese Wikipedia article forMojibake displayed as if interpreted asWindows-1252

TheUTF-8-encodedRussian Wikipedia article on Church Slavonic displayed as if interpreted asKOI8-R

This article containsspecial characters. Without properrendering support, you may seequestion marks, boxes, or other symbols.

This display may include the genericreplacement character⟨�⟩ in places where thebinary representation is considered invalid. A replacement can also involve multiple consecutive symbols, as viewed in one encoding, when the same binary code constitutes one symbol in the other encoding. This is either because of differing constant length encoding (as in Asian 16-bit encodings vs European 8-bit encodings), or the use of variable length encodings (notablyUTF-8 andUTF-16).

Failed rendering of glyphs due to either missing fonts or missing glyphs in a font is a different issue that is not to be confused with mojibake. Symptoms of this failed rendering include blocks with thecode point displayed inhexadecimal or using the generic replacement character. Importantly, these replacements arevalid and are the result of correct error handling by the software.

Original text	文				字				化				け
Raw bytes of EUC-JP encoding	CA		B8		BB		FA		B2		BD		A4		B1
EUC-JP bytes interpreted as Shift-JIS	ﾊ		ｸ		ｻ		郾				ｽ		､		ｱ
EUC-JP bytes interpreted as GBK	矢				机				步				け
EUC-JP bytes interpreted as Windows-1252	Ê		¸		»		ú		²		½		¤		±
Raw bytes of UTF-8 encoding	E6	96		87	E5	AD		97	E5	8C		96	E3	81		91
UTF-8 bytes interpreted as Shift-JIS	譁			�	蟄			怜		喧			縺			�
UTF-8 bytes interpreted as GBK	鏂			囧		瓧			鍖			栥		亼
UTF-8 bytes interpreted as Windows-1252	æ	–		‡	å	SHY		—	å	Œ		–	ã	HOP		‘

Swedish example	Source encoding	Target encoding	Result (Characters in red are incorrect.)
Smörgås (open sandwich)
	MS-DOS 437	ISO 8859-1	Smrgs
	UTF-8	ISO 8859-1	SmÃ¶rgÃ¥s
		IBM/CP037 (EBCDIC)	ë_C¶ÊÅCvË
		Mac Roman	Sm√∂rg√•s
	ISO 8859-1	Mac Roman	SmˆrgÂs

Romanian example	Source encoding	Target encoding	Result (Characters in red are incorrect.)
Cenușă (ash)
	UTF-8
		ASCII	CenuÈ™Äƒ
		ISO 8859-2	CenuČÄ
		OEM 737	Cenu╚β─Δ
		Shift-JIS	Cenuﾈ卞
		TIS-620	Cenuศฤ
		IBM/CP037 (EBCDIC)	äÁ>ÍHrDc

Hungarian example	Source encoding	Target encoding	Result	Occurrence
ÁRVÍZTŰRŐ TÜKÖRFÚRÓGÉP árvíztűrő tükörfúrógép
	UTF-8 Quoted-printable	7-bit ASCII	=C3=81RV=C3=8DZT=C5=B0R=C5=90 T=C3=9CK=C3=96RF=C3=9AR=C3=93G=C3=89P=C3=A1rv=C3=ADzt=C5=B1r=C5=91 t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p	Mainly caused by incorrectly configured mail servers but may occur inSMS messages on some cell phones as well.
	ISO 8859-2 Quoted-printable	7-bit ASCII	=C1RV=CDZT=DBR=D5 T=DCK=D6RF=DAR=D3G=C9P =E1rv=EDzt=FBr=F5 t=FCk=F6rf=FAr=F3g=E9p
	CWI-2	CP 437	ÅRVìZTÿRº TÜKÖRFùRòGÉP árvíztûrô tükörfúrógép	TheCWI-2 encoding was designed so that Hungarian text remains fairly well-readable even if the device on the receiving end uses one of the default encodings (CP 437 orCP 850). This encoding was used very heavily between the early 1980s and early 1990s, but nowadays it is completely deprecated.
	CP 852	CP 437	╡RV╓ZTδRè TÜKÖRFΘRαGÉP árvízt√rï tükörfúrógép	This was very common in the days ofDOS, as the text was often encoded using code page 852 ("Central European"), but the software on the receiving end often did not support CP 852 and instead tried to display text usingCP 437 orCP 850. Lowercase letters are mainly correct, except for ű and ő. Ü/ü and Ö/ö are correct because CP 437 and CP 850 were made compatible with German. Although this is rare nowadays, it can still be seen in places such as on printed prescriptions and cheques.
		CP 850	ÁRVÍZTÙRè TÜKÖRFÚRÓGÉP árvízt¹rï tükörfúrógép
		Windows-1250	µRVÖZTëRŠ TšK™RFéRŕGP rvˇztűr‹ tk"rfŁr˘g‚p	Both encodings are Central European, but the text is encoded with the DOS encoding and decoded with the Windows encoding. The use of ű is correct.
		Mac Roman	µRV÷ZTÎRä TöKôRFÈR‡GêP †rv°zt˚rã tÅkîrf£r¢gÇp	Also common in the days of DOS, this could be seen when Apple computers tried to display Hungarian text sent using DOS or Windows machines, as they would often default to Apple's own encoding.
	Windows-1250	Mac Roman	¡RVÕZT€R’ T‹K÷RF⁄R”G…P ·rvÌzt˚rı t¸kˆrf˙rÛgÈp
		CP 852	┴RV═ZT█RŇ T▄KÍRF┌RËG╔P ßrvÝztűr§ tŘk÷rf˙rˇgÚp	Both encodings are Central European, but the text is encoded with the Windows encoding and decoded with the DOS encoding. The use of ű is correct.
		Windows-1252	ÁRVÍZTÛRÕ TÜKÖRFÚRÓGÉP árvíztûrõ tükörfúrógép	The default Western European Windows encoding is used instead of the Central-European one. Only ő-Ő (õ-Õ) and ű-Ű (û-Û) are wrong, and the text is completely readable. This is the most common error nowadays; due to ignorance, it occurs often on webpages or even in printed media.
	UTF-8	Windows-1252	ÃRVÃZTÅ°RÅ TÃœKÃ–RFÃšRÃ"GÃ‰P Ã¡rvÃztÅ±rÅ‘ tÃ¼kÃ¶rfÃºrÃ³gÃ©p	Mainly caused by web services or webmail clients that are configured incorrectly or not tested for international usage (as the problem remains concealed for English texts). In this case the actual (often generated) content is inUTF-8, but some older software may default to localized encodings if UTF-8 is not explicitly specified in the HTML headers.
	UTF-8	Mac Roman	√ÅRV√çZT≈∞R≈ê T√úK√ñRF√öR√ìG√âP √°rv√≠zt≈±r≈ë t√ºk√∂rf√∫r√≥g√©p

Original text	Source encoding	Target encoding	Result
Кракозябры
	Windows-1251	KOI8-R	йПЮЙНГЪАПШ
	KOI8-R	Windows-1251	лТБЛПЪСВТЩ
	KOI8-R	Windows-1252	ëÒÁËÏÚÑÂÒÙ
	MS-DOS 855		Çá ÆÖóÞ¢áñ
	Windows-1251		Êðàêîçÿáðû
	UTF-8		ÐšÑ€Ð°ÐºÐ¾Ð·ÑÐ±Ñ€Ñ‹
		KOI8-R	п я─п╟п╨п╬п╥я▐п╠я─я▀ (The second character is anon-breaking space)
		MS-DOS 855	лџЛђл░л║лЙлиЛЈл▒ЛђЛІ
		Windows-1251	РљСЂР°РєРѕР·СЏР±СЂС‹
		Mac Roman	–ö—Ä–∞–∫–æ–∑—è–±—Ä—ã
		Mac Cyrillic	–Ъ—А–∞–Ї–Њ–Ј—П–±—А—Л

Example	Source encoding	Target encoding	Result
Trăm năm trong cõi người ta 𤾓𢆥𥪞𡎝𠊛些 (Truyện Kiều,Nguyễn Du)
	UTF-8	Windows-1258	TrÄƒm nÄƒm trong cÃµi ngÆ°á»i ta đ¤¾“đ¢†¥đ¥ªđ¡đ ›äº›
		TCVN3	Tr¨m n¨m trong câi ngêi ta �¤¾��¢��¥¥ª��¡��¾ä��
		VNI (Windows)	Traêm naêm trong coõi ngöôøi ta ��
		Mac Roman	TrƒÉm nƒÉm trong c√µi ng∆∞·ªùi ta §æì¢Ü••™û°éù†äõ‰∫õ

Original text	Source encoding	Target encoding	Result
このメールは皆様へのメッセージです。
	UTF-8
		UTF-7	��̃��(�q��Y�_�C�G�b�g)
		EUC-JP	��＜�若��罕��吾��＜��祉�若�吾�с��
		Shift-JIS	縺薙�繝｡繝ｼ繝ｫ縺ｯ逧�ｧ倥∈縺ｮ繝｡繝�そ繝ｼ繧ｸ縺ｧ縺吶�
		Mac Roman	„Åì„ÅÆ„É°„Éº„É´„ÅØÁöÜÊßò„Å∏„ÅÆ„É°„ÉÉ„Çª„Éº„Ç∏„Åß„Åô„ÄÇ
		ISO 8859-6	كك�ك�ك�ك�ك�هن�ك�ك�ك�كك؛ك�ك�ك�كك
		Windows-1252	ã“ã®ãƒ¡ãƒ¼ãƒ«ã¯çš†æ§˜ã¸ã®ãƒ¡ãƒƒã‚»ãƒ¼ã‚¸ã§ã™ã€‚
	EUC-JP		¤³¤Î¥á¡¼¥ë¤Ï³§ÍÍ¤Ø¤Î¥á¥Ã¥»¡¼¥¸¤Ç¤¹¡£
	Shift-JIS		‚±‚Ìƒ[ƒ‹‚ÍŠF—l‚Ö‚ÌƒƒbƒZ[ƒW‚Å‚·B

Original text	Source encoding	Target encoding	Result	Note
三國志曹操傳	Big5	GB	�T瓣в变巨肚	Garbled characters with almost no hint of original meaning. The red character is not a valid codepoint inGB 2312.
文字化けテスト	Shift-JIS		暥帤壔偗僥僗僩	Kana is displayed as characters with the亻 (Chinese:單人旁; pinyin:dānrénpáng) radical, while kanji are other characters. Many of the substitute characters are extremely uncommon in modern Chinese. Somewhat easy to identify due to the presence of multiple consecutive 亻 characters.
디제이맥스 테크니카	EUC-KR		叼力捞钙胶抛农聪墨	Randomsimplified characters which in most cases make no sense. Probably the easiest to identify because of spaces between every several characters.

Arabic example	Browser rendering	Source encoding	Target encoding	Result
(Universal Declaration of Human Rights)
	الإعلان العالمى لحقوق الإنسان
		UTF-8	KOI8-R	ь╖ы└ь╔ь╧ы└ь╖ы├ ь╖ы└ь╧ь╖ы└ы┘ы┴ ы└ь╜ы┌ы┬ы┌ ь╖ы└ь╔ы├ьЁь╖ы├
			Windows-1250	Ř§Ů„ŘĄŘąŮ„Ř§Ů† Ř§Ů„ŘąŘ§Ů„Ů…Ů‰ Ů„ŘŮ‚ŮŮ‚ Ř§Ů„ŘĄŮ†ŘłŘ§Ů†
			Windows-1251	Ш§Щ„ШҐШ№Щ„Ш§Щ† Ш§Щ„Ш№Ш§Щ„Щ…Щ‰ Щ„ШЩ‚Щ€Щ‚ Ш§Щ„ШҐЩ†ШіШ§Щ†
			Windows-1252	Ø§Ù„Ø¥Ø¹Ù„Ø§Ù† Ø§Ù„Ø¹Ø§Ù„Ù…Ù‰ Ù„ØÙ‚ÙˆÙ‚ Ø§Ù„Ø¥Ù†Ø³Ø§Ù†
			Windows-1256	ط§ظ„ط¥ط¹ظ„ط§ظ† ط§ظ„ط¹ط§ظ„ظ…ظ‰ ظ„طظ‚ظˆظ‚ ط§ظ„ط¥ظ†ط³ط§ظ†
			ISO 8859-5	иЇй�иЅиЙй�иЇй� иЇй�иЙиЇй�й�й� й�ий�й�й� иЇй�иЅй�иГиЇй�
			ISO 8859-6	ظ�عظ�ظ�عظ�ع ظ�عظ�ظ�ععع عظععع ظ�عظ�عظ�ظ�ع
			CP 852	ěž┘äěąě╣┘äěž┘ć ěž┘äě╣ěž┘ä┘ů┘ë ┘äěş┘é┘ł┘é ěž┘äěą┘ćě│ěž┘ć
			CP 866	╪з┘Д╪е╪╣┘Д╪з┘Ж ╪з┘Д╪╣╪з┘Д┘Е┘Й ┘Д╪н┘В┘И┘В ╪з┘Д╪е┘Ж╪│╪з┘Ж
			Mac Arabic	ظ'عÑظ٪ظ٩عÑظ'عÜ ظ'عÑظ٩ظ'عÑعÖعâ عÑظ-عÇعàعÇ ظ'عÑظ٪عÜظ٣ظ'عÜ
			Mac Roman	ÿßŸÑÿ•ÿπŸÑÿßŸÜ ÿßŸÑÿπÿßŸÑŸÖŸâ ŸÑÿ≠ŸÇŸàŸÇ ÿßŸÑÿ•ŸÜÿ≥ÿßŸÜ
		Mac Arabic		«‰≈Ÿ‰«Ê†«‰Ÿ«‰ÂÈ†‰Õ‚Ë‚†«‰≈Ê”«Ê
		Windows-1256		«·≈⁄·«‰ «·⁄«·„Ï ·ÕﬁÊﬁ «·≈‰”«‰
		Windows-1256	Windows-1252	ÇáÅÚáÇä ÇáÚÇáãì áÍÞæÞ ÇáÅäÓÇä

Movatterモバイル変換

Mojibake

Causes

Underspecification

Mis-specification

User oversight

Overspecification

Lack of hardware or software support

Resolutions

Problems in different writing systems

English

Other Western European languages

Central and Eastern European

Hungarian

Examples

Polish

Russian and other Cyrillic-based alphabets

Yugoslav languages

Asian encodings

Vietnamese

Japanese

Chinese

Indic text

Burmese

African languages

Arabic

Examples

See also

References

External links