ACCSID (coded character set identifier) is a 16-bit number that represents a particularencoding of a specificcode page. For example,Unicode is a code page that has several character encoding schemes (referred to as "transformation formats")—includingUTF-8,UTF-16 andUTF-32—but which may or may not actually be accompanied by a CCSID number to indicate that this encoding is being used.
The termscode page andCCSID are often used interchangeably, even though they are not synonymous. A code page may be only part of what makes up a CCSID. The following definitions from IBM help to illustrate this point:
Aglyph is the actual physical pattern of pixels or ink that shows up on a display or printout.
Acharacter is a concept that covers all glyphs associated with a certain symbol. For instance, "F", "F", "F", "F", "F", and "F" are all different glyphs, but use the same character. The various modifiers (bold, italic, underline, color, and font) do not change the fact that these glyphs representU+0046FLATIN CAPITAL LETTER F.
Acharacter set contains the characters necessary to allow a particular human to carry on a meaningful interaction with the computer. It does not specify how those characters are represented in a computer.[1] This level is the first one to separate characters into various alphabets (Latin, Arabic, Hebrew, Cyrillic, and so on) or ideographic groups (e.g., Chinese, Korean). It corresponds to a "character repertoire" in theUnicode encoding model.
Acode page represents a particular assignment of code point values to characters.[1] It corresponds to a "coded character set" in the Unicode encoding model. Acode point for a character is the computer's internal representation of that character in a given code page.[1] Many characters are represented by different code points in different code pages. Certain character sets can be adequately represented with single-byte code pages (which have a maximum 256 code points, hence a maximum of 256 characters), but many require more than that. Examples includeJIS X 0208 andUnicode.
Anencoding scheme is the byte format of a code page. It maps code point values to sequences of one or more byte values in a computer.[2] For example,UTF-8 andUTF-16BE are two encodings of the same Unicode code page. (Varying only in how many bytes are needed to represent a particular Unicode character value, how it is contained within those bytes, and how the presence of Unicode information is indicated.) Meanwhile, in IBM's character data representation architecture (CDRA), this is typically represented with an ESID (encoding scheme identifier).[3]EUC andISO-2022 are other examples of encoding schemes.
Acoded character set identifier (CCSID) contains all of the information necessary to assign and preserve the meaning and rendering of characters through various stages of processing and interchange. This information always includes at least one code page, but may include multiple code pages of differing byte-lengths. The CCSID also has an associated encoding scheme that governs how various code points are to be handled. This mechanism allows a program to recognizebidirectional orientation, character shaping (mainly of Arabic characters), and other complex encoding information.
All three of these variantShift-JIS CCSIDs aremulti-byte character sets (MBCS): the single-byte character set (SBCS) portion of each CCSID is different. Thedouble-byte character set (DBCS) portion is the same across each CCSID. CCSID 5028 uses an updated code page 897 called CCSID 4993. CCSID 932 uses the original code page 897, which is CCSID 897. CCSID 942 uses a different SBCS from the other two CCSIDs, which is 1041.
Also notice how CCSID 5028 and 4993 are different by 4096 (1000 in hexadecimal) from the predecessor CCSID with the same code page identifier. This is a common way that CDRA denotes an upgraded CCSID.
There are a few reasons for this complexity:
Many of the CCSIDs are used in IBM databases, likeIBM Db2, where a database field only supports an SBCS, DBCS or MBCS string. CCSIDs allow programs to differentiate between which one is being used.
When characters are added or replaced, like the Euro currency sign introduction, one can know whether the stored strings support or do not support those character additions because a different CCSID is being used. This versioning is important for the integrity of the data.
It enables reuse of resources among similar CCSIDs.[7]