Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Double-byte character set

From Wikipedia, the free encyclopedia
(Redirected fromDBCS)
"DBCS" redirects here. For other uses, seeDBCS (disambiguation).
This article includes alist of references,related reading, orexternal links,but its sources remain unclear because it lacksinline citations. Please helpimprove this article byintroducing more precise citations.(September 2021) (Learn how and when to remove this message)

Adouble-byte character set (DBCS) is acharacter encoding in which either all characters (includingcontrol characters) are encoded in two bytes, or merely everygraphic character not representable by an accompanyingsingle-byte character set (SBCS) is encoded in twobytes (Han characters would generally comprise most of these two-byte characters). A DBCS supports national languages that contain many unique characters or symbols (the maximum number of characters that can be represented with one byte is256 characters, while two bytes can represent up to65,536 characters). Examples of such languages includeJapanese andChinese.Hangul does not contain as many characters, butKS X 1001 supports both Hangul andHanja, and uses two bytes per character.

In CJK computing

[edit]

The termDBCS traditionally refers to a character encoding where each graphic character is encoded in two bytes.

In an 8-bit code, such asBig-5 orShift JIS, a character from the DBCS is represented with a lead (first) byte with themost significant bit set (i.e., being greater than seven bits), and paired up with a single-byte character-set (SBCS). For the practical reason of maintaining compatibility with unmodified, off-the-shelf software, the SBCS is associated withhalf-width characters and the DBCS withfull-width characters. In a 7-bit code such asISO-2022-JP,escape sequences orshift codes are used to switch between the SBCS and DBCS.

Sometimes, the use of the term "DBCS" can imply an underlying structure that does not comply withISO 2022. For example, "DBCS" can sometimes mean a double-byte encoding that is specifically notExtended Unix Code (EUC).

This original meaning of DBCS is different from what some consider correct usage today. Some insist that these character encodings be properly calledmulti-byte character sets (MBCS) orvariable-width encodings, because character encodings such asEUC-JP,EUC-KR,EUC-TW,GB 18030, andUTF-8 use more than two bytes for some characters, and they support one byte for other characters.

Ambiguity

[edit]

Some people use DBCS to mean theUTF-16 andUTF-8 encodings, while other people use the term DBCS to mean older (pre-Unicode) character encodings that use more than one byte per character.Shift JIS,GB 2312 andBig5 are a few character encodings that can contain more than one byte per character, but even using the term DBCS for these character encodings is incorrect terminology because these character encodings are reallyvariable-width encodings (as are both UTF-16 and UTF-8). SomeIBM mainframes do have true DBCS code pages, which contain only the double byte portion of a multi-byte code page.

If a person uses the term "DBCS enablement" for softwareinternationalization, they are using ambiguous terminology. They either mean they want to write software forEast Asian markets using older technology with code pages, or they are planning on using Unicode. Sometimes this term also impliestranslation into an East Asian language. Usually "Unicode enablement" means internationalizing software by using Unicode, and "DBCS enablement" means using incompatible character encodings that exist between the various countries in East Asia for internationalizing software. Since Unicode, unlike many other character encodings, supports all the major languages in East Asia, it is generally easier to enable and maintain software that uses Unicode. DBCS (non-Unicode) enablement is usually only desired when much older operating systems or applications do not support Unicode.

TBCS

[edit]

A triple-byte character set (TBCS) is a character encoding in which characters (including control characters) are encoded in three bytes.

See also

[edit]

External links

[edit]
Early telecommunications
ISO/IEC 8859
Bibliographic use
National standards
ISO/IEC 2022
Mac OSCode pages
("scripts")
DOS code pages
IBM AIX code pages
Windows code pages
EBCDIC code pages
DEC terminals (VTx)
Platform specific
Unicode /ISO/IEC 10646
TeX typesetting system
Miscellaneous code pages
Control character
Related topics
Retrieved from "https://en.wikipedia.org/w/index.php?title=Double-byte_character_set&oldid=1270420864"
Category:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp