Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

UTF-EBCDIC

From Wikipedia, the free encyclopedia
Character encoding for Unicode compatible with EBCDIC
UTF-EBCDIC
Created byIBM
DefinitionsUnicode Technical Report #16
Based onUTF-8
Transforms / EncodesUnicode

UTF-EBCDIC is acharacter encoding capable of encoding all 1,112,064 valid charactercode points inUnicode using 1 to 5bytes (in contrast to a maximum of 4 forUTF-8).[1] It is meant to beEBCDIC-friendly, so that legacy EBCDIC applications onmainframes may process the characters without much difficulty. Its advantages for existing EBCDIC-based systems are similar toUTF-8's advantages for existingASCII-based systems. Details on UTF-EBCDIC are defined in Unicode Technical Report #16.

To produce the UTF-EBCDIC encoded version of a series of Unicode code points, an encoding based on UTF-8 (known in the specification as UTF-8-Mod) is applied first (creating what the specification calls an I8 sequence). The main difference between this encoding and UTF-8 is that it allows Unicode code pointsU+0080 throughU+009F (theC1 control codes) to be represented as a single byte and therefore later mapped to corresponding EBCDIC control codes. In order to achieve this, UTF-8-Mod uses101xxxxx instead of10xxxxxx as the format for trailing bytes in a multi-byte sequence. As this can only hold 5 bits rather than 6, the UTF-8-Mod encoding of codepoints aboveU+03FF are larger than the UTF-8 encoding.

The UTF-8-Mod transformation leaves the data in an ASCII-based format (for example,U+0041 "A" is still encoded as0x41), so each byte is fed through a reversible (one-to-one) lookup table to produce the final UTF-EBCDIC encoding. For example,0x41 in this table maps to0xC1; thus the UTF-EBCDIC encoding ofU+0041 (Unicode's "A") is0xC1 (EBCDIC's "A").

UTF-EBCDIC is rarely used, even on the EBCDIC-based mainframes for which it was designed.IBM EBCDIC-based mainframe operating systems, such asz/OS, usually useUTF-16 for complete Unicode support. For example,IBM Db2,COBOL,PL/I,Java and theIBMXML toolkit support UTF-16 on IBM mainframes.

Codepage layout

[edit]

There are 160 characters with single-byte encodings in UTF-EBCDIC (compared to 128 in UTF-8). As can be seen, the single-byte portion is similar toIBM-1047 instead of IBM-37 due to the location of the square brackets.CCSID 37 has [] at hex BA and BB instead of at hex AD and BD respectively.

UTF-EBCDIC
0123456789ABCDEF
0xNULSOHSTXETXSTHTSSADELEPARISS2VTFFCRSOSI
1xDLEDC1DC2DC3OSCLFBSESACANEMPU2SS3FSGSRSUS
2xPADHOPBPHNBHINDNELETBESCHTSHTJVTSPLDPLUENQACKBEL
3xDCSPU1SYNSTSCCHMWSPAEOTSOSSGCISCICSIDC4NAKPMSUB
4x SP .<(+|
5x&!$*);^
6x-/,%_>?
7x22222`:#@'="
8x2abcdefghi222222
9x2jklmnopqr222222
Ax2~stuvwxyz222[22
Bx2222222333333]33
Cx{ABCDEFGHI333333
Dx}JKLMNOPQR334444
Ex\4STUVWXYZ44455
Fx0123456789APC
  Start bytes for a sequence of that many bytes. Tooltip shows the lowest code point encoded using that start byte.
  Start byte where not all combinations of continuation bytes are valid, either because it is an invalid overlong form (the tooltip shows the code point of the first valid sequence), or because it encodes a code point greater than U+10FFFF.
  Continuation bytes. Tooltip shows the hexadecimal value of the 5 bits they add.
  Unused, including lead bytes that can only start an invalid overlong form. For example, 0x76 because even 0x76 0x73 (which maps to the UTF-8-Mod sequence 0xC2 0xBF) would merely be an overlong encoding of U+005F (properly encoded as UTF-8-Mod 0x5F, UTF-EBCDIC 0x6D).

Oracle UTFE

[edit]

Oracle UTFE is a Unicode 3.0 UTF-8Oracle database variation, similar to theCESU-8 variant of UTF-8, where supplementary characters are encoded as two 4-byte characters rather than a single 4- or 5-byte character. It is used only on EBCDIC platforms.[2]

See also

[edit]

References

[edit]
  1. ^"UTR #16: UTF-EBCDIC".www.unicode.org. Retrieved2021-02-23.You need to search at most five bytes (seven bytes, if the full range of 31 bits of ISO/IEC 10646 is considered) backwards
  2. ^Baird, Cathy; Chiba, Dan; Chu, Winson; Fan, Jessica; Ho, Claire; Law, Simon; Lee, Geoff; Linsley, Peter; Matsuda, Keni; Oscroft, Tamzin; Takeda, Shige; Tanaka, Linus; Tozawa, Makoto; Trute, Barry; Tsujimoto, Mayumi; Wu, Ying; Yau, Michael; Yu, Tim; Wang, Chao; Wong, Simon; Zhang, Weiran; Zheng, Lei; Zhu, Yan; Moore, Valarie (2002) [1996]. "Appendix A: Locale Data".Oracle9i Database Globalization Support Guide(PDF) (Release 2 (9.2) ed.).Oracle Corporation. Oracle A96529-01.Archived(PDF) from the original on 2017-02-14. Retrieved2017-02-14.

External links

[edit]
Unicode
Code points
Characters
Special purpose
Lists
Processing
Algorithms
Comparison of encodings
On pairs of
code points
Usage
Related standards
Related topics
Scripts and symbols in Unicode
Common and
inherited scripts
Modern scripts
Ancient and
historic scripts
Notational scripts
Symbols, emojis
Early telecommunications
ISO/IEC 8859
Bibliographic use
National standards
ISO/IEC 2022
Mac OSCode pages
("scripts")
DOS code pages
IBM AIX code pages
Windows code pages
EBCDIC code pages
DEC terminals (VTx)
Platform specific
Unicode /ISO/IEC 10646
TeX typesetting system
Miscellaneous code pages
Control character
Related topics
Retrieved from "https://en.wikipedia.org/w/index.php?title=UTF-EBCDIC&oldid=1222414866"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp