Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

T.51/ISO/IEC 6937

From Wikipedia, the free encyclopedia
(Redirected fromISO 6937)
ITU-T Recommendation
T.51
Latin based coded character sets for telematic services
StatusIn force
Year started1984
Latest version(09/92)
September 1992
OrganizationITU-T
CommitteeStudy Group VIII
Related standardsT.61,ETS 300 706,ISO/IEC 10367,ISO/IEC 2022,ISO 5426
Domainencoding
LicenseFreely available
Websitehttps://www.itu.int/rec/T-REC-T.51
T.51
Alias(es)
  • Code page 20269
  • ISO-IR-90 (old)
  • ISO-IR-142 (old)
  • ISO-IR-156
Standard
Based onITU T.61
Other related encoding(s)

T.51 / ISO/IEC 6937:2001,Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension ofASCII, or more preciselyISO/IEC 646-IRV.[1] It was developed in common withITU-T (thenCCITT) for telematic services under the name ofT.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters withdiacritics. The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.

ISO/IEC 6937's architects wereHugh McGregor Ross, Peter Fenwick,Bernard Marti andLoek Zeckendorf.

ISO6937/2 defines 327 characters found in modern European languages using theLatin alphabet. Non-Latin European characters, such asCyrillic andGreek, are not included in the standard. Also, some diacritics used with the Latin alphabet like theRomaniancomma are not included, using cedilla instead as no distinction between cedilla and comma below was made at the time.

IANA has registered the charset namesISO_6937-2-25 andISO_6937-2-add for two (older) versions of this standard (plus control codes). But in practice this character encoding is unused on the Internet.

Single byte characters

[edit]

The primary set (first half) originally followedISO 646-IRVbefore theISO/IEC 646:1991 revision, that is, mostly followingASCII but with character 0x24 still denoted as an "international currency sign" (¤) instead of the dollar sign ($). The 1992 edition of ITU T.51 permits existing CCITT services to continue to interpret 0x24 as the international currency sign, but stipulates that new telecommunication applications should use it for the dollar sign (i.e. following the current ISO 646-IRV), and instead represent the international currency sign using the supplementary set.[2]

The supplementary set (second half) contains a selection of spacing and non-spacing graphic characters, additional symbols and some locations reserved for future standardisation.

Both of these areISO/IEC 2022 graphical character sets, with the primary set being a 94-code set and the secondary set being a 96-code set. In contexts where ISO 2022 code extension techniques are not in use, the primary set is designated as the G0 set and invoked over GL (0x20..0x7F), whereas the supplementary set is designated as the G2 set and invoked over GR (0xA0..0xFF) in an 8-bit environment, or by using the control code 0x19 as a single-shift in a 7-bit environment.[3] This encoding of the Single Shift Two code matches its location inISO-IR-106.[4]

The ISO/IEC 2022escape sequence to designate the supplementary set of ISO/IEC 6937 as the G2 set isESC . R (hex1B 2E 52).[2][5][6] The older ISO 6937/2:1983 supplementary set is registered as a 94-code set, and designated to G2 withESC * l (hex1B 2A 6C).[5][7]

Two byte characters

[edit]

Accented letters which are not allocated single codes in the primary or supplementary set are coded using two bytes. The first byte, the "non spacing diacritical mark", is followed by a letter from the base set e.g.:

small e with acute accent (é) = [Acute]+e

The ITU T.51 standard allocates column 4 of the supplementary set (i.e.0xC0–CF when used in 8-bit format) to non-spacing diacritic characters.[2] However, ISO/IEC 6937 defines a fully specified character repertoire, mapping a list of composition sequences toISO/IEC 10646 character names which match those defined in Unicode. The isolated nonspacing bytes are not included in this repertoire, although spacing variants of the diacritics not otherwise present in ASCII are included, with the ASCII space being the trail byte.[5][8] Hence, only certain combinations of lead byte and follow byte conform to the ISO/IEC standard.

This repertoire is also affixed to the ITU version of the specification as Annex A, although the ITU version does not reference it from the main text. It is described as a "unified superset" of the Latin-script character repertoires.[2] It corresponds to the repertoire ofISO/IEC 10367 when the ASCII,Latin-1 (orLatin-5),Latin-2 andsupplementary Latin sets are used.[5]

This system also differs from the Unicodecombining character system in that the diacritic code precedes the letter (as opposed to following it), making it more similar toANSEL.

A little anomaly is thatLatin Small Letter G with Cedilla is coded as if it were with an acute accent, that is, with a 0xC2 lead byte, since due to its descender interfering with a cedilla, the lowercase letter is usually with turned comma above:Ģ ģ.

In total 13 diacritical marks can be followed by the selected characters from the primary set:

AccentCodeSecond characterResult
Grave0xC1AEIOUaeiouÀÈÌÒÙàèìòù
Acute0xC2ACEILNORSUYZacegilnorsuyzÁĆÉÍĹŃÓŔŚÚÝŹáćéģíĺńóŕśúýź
Circumflex0xC3ACEGHIJOSUWYaceghijosuwyÂĈÊĜĤÎĴÔŜÛŴŶâĉêĝĥîĵôŝûŵŷ
Tilde0xC4AINOUainouÃĨÑÕŨãĩñõũ
Macron0xC5AEIOUaeiouĀĒĪŌŪāēīōū
Breve0xC6AGUaguĂĞŬăğŭ
Dot0xC7CEGIZcegzĊĖĠİŻċėġż
Umlaut or diæresis0xC8AEIOUYaeiouyÄËÏÖÜŸäëïöüÿ
Ring0xCAAUauÅŮåů
Cedilla0xCBCGKLNRSTcklnrstÇĢĶĻŅŖŞŢçķļņŗşţ
Double Acute0xCDOUouŐŰőű
Ogonek0xCEAEIUaeiuĄĘĮŲąęįų
Caron0xCFCDELNRSTZcdelnrstzČĎĚĽŇŘŠŤŽčďěľňřšťž

Codepage layout

[edit]

The reference tocombining characters in the U+0300—U+036F range for the codes in the range 0xC1—0xCF below is subject to the caveats mentioned above; they cannot simply be mapped to the codepoints listed. Also, Unicode distinguishes 0xE2 into uppercaseD with stroke and uppercaseEth, which usually look different for the lowercase letters (0xF2 and 0xF3).

The older 1988 edition of ITU T.51 defined two versions of the supplementary set, with the first version lacking thenon-breaking space,soft hyphen, not sign (¬) and broken bar (¦) present in the second version. The first version was defined as an extension of theT.61 supplementary set, and the second version as an extension of the first version.[9] The current (1992) edition only includes the second version, deprecates certain characters, and updates the primary set to the current ISO-646-IRV (ASCII), although existing telematic services are permitted to retain the older behaviour.[2]

ISO/IEC 6937 or ITU T.51 (Latin)
0123456789ABCDEF
0x
1x
2x SP !"#$/¤[a]%&'()*+,-./
3x0123456789:;<=>?
4x@ABCDEFGHIJKLMNO
5xPQRSTUVWXYZ[\]^_
6x`abcdefghijklmno
7xpqrstuvwxyz{|}~
8x
9x
AxNBSP¡¢£$[b]¥#[b]§¤«
Bx°±²³×µ·÷»¼½¾¿
Cx◌̀◌́◌̂◌̃◌̄◌̆◌̇◌̈◌̊◌̧◌̲[c]◌̋◌̨◌̌
Dx¹®©¬¦
ExÆĐ/ЪĦ[d]IJĿŁØŒºÞŦŊʼn
FxĸæđðħıijŀłøœßþŧŋSHY
  Differences fromT.61

Videotex version

[edit]
Main article:Videotex character set

The versions of the supplementary set used by the ITU T.101 standard forVideotex are based on the first supplementary set of the 1988 edition of T.51.

The default G2 set for Data Syntax 2 adds a΅ at 0xC0, for combination with codes from aGreek primary set.[10]

The supplementary set for Data Syntax 3 adds non-spacing marks for a "vector overbar" andsolidus and severalsemigraphic characters.[11]

ETS 300 706 version

[edit]

The ETS 300 706 standard forWorld System Teletext bases itsG2 set on ISO 6937.[12] It is a superset of the supplementary set ofT.61, and a superset of the first supplementary set of the 1988 edition of T.51, but collides with the current edition of T.51 in certain positions. Diacritic codes in the ETS version are specified as being "for association with" characters from theG0 set in use,[12] such asUS-ASCII orBS_viewdata. This version is shown in the chart below.

World System Teletext, Latin G2 Set (ETS 300 706:1997)[12]
0123456789ABCDEF
Ax SP ¡¢£$¥#§¤«
Bx°±²³×µ·÷»¼½¾¿
Cx◌̀◌́◌̂◌̃◌̄◌̆◌̇◌̣̈◌̣◌̊◌̧◌̲◌̋◌̨◌̌
Dx¹®©α
ExÆĐ/ЪĦIJĿŁØŒºÞŦŊʼn
Fxĸæđðħıijŀłøœßþŧŋ
  Differences from T.51

See also

[edit]

Footnotes

[edit]
  1. ^Continued use for ¤ permitted for existing CCITT services only.[2]
  2. ^abPermitted for existing CCITT services only, otherwise the ASCII representation should be used.[2]
  3. ^Noted in the ITU version of the standard as having existing use forunderlined text, in combination with any other character including accented characters. Although the 1988 ITU edition includes this code,[9] the 1992 ITU edition discourages sending this code in favour ofANSI escape sequences, although it does mention that it should be correctly interpreted when received by applicable systems.[2] Previous editions of the ISO/IEC version of the standard also allowed combining this code with any character in the defined repertoire,[7] whereas more recent revisions do not include this code.[5]
  4. ^An early draft placedȷ in this position.

References

[edit]
  1. ^"T.51 : Latin based coded character sets for telematic services".www.itu.int.Archived from the original on 2019-10-08. Retrieved2019-11-14.
  2. ^abcdefghCCITT (1992-09-18).Latin based coded character sets for telematic services (1992 ed.). Recommendation T.51.
  3. ^ITU-T (1995-08-11).Recommendation T.51 (1992) Amendment 1.
  4. ^ITU (1985-08-01).Teletex Primary Set of Control Functions(PDF). ITSCJ/IPSJ.ISO-IR-106.
  5. ^abcdeISO/IEC JTC 1/SC 2/WG 3 (1998-04-15).WD 6937, Coded graphic character set for text communication - Latin alphabet(PDF). JTC1/SC2/N454.{{citation}}: CS1 maint: numeric names: authors list (link)
  6. ^ISO/IEC JTC 1/SC 2/WG 3 (1991-12-15).Supplementary Set of ISO/IEC 6937:1992(PDF). ITSCJ/IPSJ.ISO-IR-156.{{citation}}: CS1 maint: numeric names: authors list (link) (The left-hand side isUS-ASCII.)
  7. ^abISO/TC97/SC2/WG4 (1985-01-10).Supplementary Set of Latin Alphabetic and non-Alphabetic Graphic Characters(PDF). ITSCJ/IPSJ.ISO-IR-90.{{citation}}: CS1 maint: numeric names: authors list (link)
  8. ^Petersen, J. K. (2002-05-29).The Telecommunications Illustrated Dictionary. CRC Press. p. 888.ISBN 978-1-4200-4067-8.
  9. ^abCCITT (1988).Coded character sets for telematic services (1988 ed.). Recommendation T.51.
  10. ^CCITT (1988-11-01).Supplementary Set of Graphic Characters for Videotex(PDF). ITSCJ/IPSJ.ISO-IR-70.
  11. ^CCITT (1986-11-30).Supplementary Set of Graphic Characters for CCITT Recommendation T.101, Data Syntax III(PDF). ITSCJ/IPSJ.ISO-IR-128.
  12. ^abcETSI (1997). "15.6.3 Latin G2 Set".Enhanced Teletext specification (PDF)(PDF). p. 116. ETS 300 706.

External links

[edit]
Early telecommunications
ISO/IEC 8859
Bibliographic use
National standards
ISO/IEC 2022
Mac OSCode pages
("scripts")
DOS code pages
IBM AIX code pages
Windows code pages
EBCDIC code pages
DEC terminals (VTx)
Platform specific
Unicode /ISO/IEC 10646
TeX typesetting system
Miscellaneous code pages
Control character
Related topics
1–9999
10000–19999
20000–29999
30000+
Retrieved from "https://en.wikipedia.org/w/index.php?title=T.51/ISO/IEC_6937&oldid=1240864768"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp