Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

UTF-1

From Wikipedia, the free encyclopedia
Obsolete multibyte encoding for Unicode
This articlemay be too technical for most readers to understand. Pleasehelp improve it tomake it understandable to non-experts, without removing the technical details.(September 2024) (Learn how and when to remove this message)
This article includes alist of references,related reading, orexternal links,but its sources remain unclear because it lacksinline citations. Please helpimprove this article byintroducing more precise citations.(September 2024) (Learn how and when to remove this message)
UTF-1
MIME / IANAISO-10646-UTF-1
LanguageInternational
Current statusObscure, of mainly historical interest.
ClassificationUnicode Transformation Format,extended ASCII,variable-width encoding
ExtendsUS-ASCII
Transforms / EncodesISO/IEC 10646 (Unicode)
Succeeded byUTF-8

UTF-1 is an obsolete method of transformingISO/IEC 10646/Unicode into a stream ofbytes. Its design does not provideself-synchronization, which makes searching forsubstrings and error recovery difficult. It reuses the ASCII printing characters for multi-byte encodings, making it unsuited for some uses (for instance Unix filenames cannot contain the byte value used for forward slash). UTF-1 is also slow to encode or decode due to its use of division and multiplication by a number which is not a power of 2. Due to these issues, it did not gain acceptance and was quickly replaced byUTF-8.

Design

[edit]

Similar toUTF-8, UTF-1 is avariable-width encoding that is backwards-compatible withASCII. EveryUnicodecode point is represented by either a single byte, or a sequence of two, three, orfive bytes. All ASCII code points are a single byte (the code pointsU+0080 throughU+009F are also single bytes).

UTF-1 does not use theC0 and C1 control codes or the space character in multi-byte encodings: a byte in the range 0–0x20 or 0x7F–0x9F always stands for the corresponding code point. This design with 66protected characters tried to beISO/IEC 2022 compatible.

UTF-1 uses "modulo 190" arithmetic (256 − 66 = 190). For comparison, UTF-8 protects all 128 ASCII characters and needs one bit for this, and a second bit to make it self-synchronizing, resulting in "modulo 64" arithmetic (8 − 2 = 6;26 = 64).BOCU-1 protects only the minimal set required forMIME-compatibility (0x00, 0x07–0x0F, 0x1A–0x1B, and 0x20), resulting in "modulo 243" arithmetic (256 − 13 = 243).


UTF-1
First code pointLast code pointByte 1Byte 2Byte 3Byte 4Byte 5
U+0000U+009F00–9F
U+00A0U+00FFA0A0–FF
U+0100U+4015A1–F521–7E, A0–FF
U+4016U+38E2DF6–FB21–7E, A0–FF21–7E, A0–FF
U+38E2EU+7FFFFFFFFC–FF21–7E, A0–FF21–7E, A0–FF21–7E, A0–FF21–7E, A0–FF
code pointUTF-8UTF-1
U+007F7F7F
U+0080C2 8080
U+009FC2 9F9F
U+00A0C2 A0A0 A0
U+00BFC2 BFA0 BF
U+00C0C3 80A0 C0
U+00FFC3 BFA0 FF
U+0100C4 80A1 21
U+015DC5 9DA1 7E
U+015EC5 9EA1 A0
U+01BDC6 BDA1 FF
U+01BEC6 BEA2 21
U+07FFDF BFAA 72
U+0800E0 A0 80AA 73
U+0FFFE0 BF BFB5 48
U+1000E1 80 80B5 49
U+4015E4 80 95F5 FF
U+4016E4 80 96F6 21 21
U+D7FFED 9F BFF7 2F C3
U+E000EE 80 80F7 3A 79
U+F8FFEF A3 BFF7 5C 3C
U+FDD0EF B7 90F7 62 BA
U+FDEFEF B7 AFF7 62 D9
U+FEFFEF BB BFF7 64 4C
U+FFFDEF BF BDF7 65 AD
U+FFFEEF BF BEF7 65 AE
U+FFFFEF BF BFF7 65 AF
U+10000F0 90 80 80F7 65 B0
U+38E2DF0 B8 B8 ADFB FF FF
U+38E2EF0 B8 B8 AEFC 21 21 21 21
U+FFFFFF3 BF BF BFFC 21 37 B2 7A
U+100000F4 80 80 80FC 21 37 B2 7B
U+10FFFFF4 8F BF BFFC 21 39 6E 6C
U+7FFFFFFFFD BF BF BF BF BFFD BD 2B B9 40

Although modern Unicode ends at U+10FFFF, both UTF-1 and UTF-8 were designed to encode the complete 31 bits of the originalUniversal Character Set (UCS-4), and the last entry in this table shows this original final code point.

See also

[edit]

References

[edit]
Unicode
Code points
Characters
Special purpose
Lists
Processing
Algorithms
Comparison of encodings
On pairs of
code points
Usage
Related standards
Related topics
Scripts and symbols in Unicode
Common and
inherited scripts
Modern scripts
Ancient and
historic scripts
Notational scripts
Symbols, emojis
Early telecommunications
ISO/IEC 8859
Bibliographic use
National standards
ISO/IEC 2022
Mac OSCode pages
("scripts")
DOS code pages
IBM AIX code pages
Windows code pages
EBCDIC code pages
DEC terminals (VTx)
Platform specific
Unicode /ISO/IEC 10646
TeX typesetting system
Miscellaneous code pages
Control character
Related topics
Retrieved from "https://en.wikipedia.org/w/index.php?title=UTF-1&oldid=1314818407"
Category:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp