Movatterモバイル変換

[0]ホーム

Jump to content

T.51/ISO/IEC 6937

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromISO 6937)

ITU-T Recommendation

T.51
Latin based coded character sets for telematic services
Status	In force
Year started	1984
Latest version	(09/92) September 1992
Organization	ITU-T
Committee	Study Group VIII
Related standards	T.61,ETS 300 706,ISO/IEC 10367,ISO/IEC 2022,ISO 5426
Domain	encoding
License	Freely available
Website	https://www.itu.int/rec/T-REC-T.51

T.51
Alias(es)	Code page 20269 ISO-IR-90 (old) ISO-IR-142 (old) ISO-IR-156
Standard	ISO/IEC 6937 ITU T.51
Based on	ITU T.61
Other related encoding(s)	ETS 300 706 ISO 5426 NeXT Multinational PostScript Standard Encoding ITU T.101
v t e

T.51 / ISO/IEC 6937:2001,Information technology — Coded graphic character set for text communication — Latin alphabet, is a multibyte extension ofASCII, or more preciselyISO/IEC 646-IRV.^[1] It was developed in common withITU-T (thenCCITT) for telematic services under the name ofT.51, and first became an ISO standard in 1983. Certain byte codes are used as lead bytes for letters withdiacritics. The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on.

ISO/IEC 6937's architects wereHugh McGregor Ross, Peter Fenwick,Bernard Marti andLoek Zeckendorf.

ISO6937/2 defines 327 characters found in modern European languages using theLatin alphabet. Non-Latin European characters, such asCyrillic andGreek, are not included in the standard. Also, some diacritics used with the Latin alphabet like theRomanian comma are not included, using cedilla instead as no distinction between cedilla and comma below was made at the time.

IANA has registered the charset namesISO_6937-2-25 andISO_6937-2-add for two (older) versions of this standard (plus control codes). But in practice this character encoding is unused on the Internet.

Single byte characters

[edit]

The primary set (first half) originally followedISO 646-IRVbefore theISO/IEC 646:1991 revision, that is, mostly followingASCII but with character 0x24 still denoted as an "international currency sign" (¤) instead of the dollar sign ($). The 1992 edition of ITU T.51 permits existing CCITT services to continue to interpret 0x24 as the international currency sign, but stipulates that new telecommunication applications should use it for the dollar sign (i.e. following the current ISO 646-IRV), and instead represent the international currency sign using the supplementary set.^[2]

The supplementary set (second half) contains a selection of spacing and non-spacing graphic characters, additional symbols and some locations reserved for future standardisation.

Both of these areISO/IEC 2022 graphical character sets, with the primary set being a 94-code set and the secondary set being a 96-code set. In contexts where ISO 2022 code extension techniques are not in use, the primary set is designated as the G0 set and invoked over GL (0x20..0x7F), whereas the supplementary set is designated as the G2 set and invoked over GR (0xA0..0xFF) in an 8-bit environment, or by using the control code 0x19 as a single-shift in a 7-bit environment.^[3] This encoding of the Single Shift Two code matches its location inISO-IR-106.^[4]

The ISO/IEC 2022escape sequence to designate the supplementary set of ISO/IEC 6937 as the G2 set isESC . R (hex1B 2E 52).^[2]^[5]^[6] The older ISO 6937/2:1983 supplementary set is registered as a 94-code set, and designated to G2 withESC * l (hex1B 2A 6C).^[5]^[7]

Two byte characters

[edit]

Accented letters which are not allocated single codes in the primary or supplementary set are coded using two bytes. The first byte, the "non spacing diacritical mark", is followed by a letter from the base set e.g.:

small e with acute accent (é) = [Acute]+e

The ITU T.51 standard allocates column 4 of the supplementary set (i.e.0xC0–CF when used in 8-bit format) to non-spacing diacritic characters.^[2] However, ISO/IEC 6937 defines a fully specified character repertoire, mapping a list of composition sequences toISO/IEC 10646 character names which match those defined in Unicode. The isolated nonspacing bytes are not included in this repertoire, although spacing variants of the diacritics not otherwise present in ASCII are included, with the ASCII space being the trail byte.^[5]^[8] Hence, only certain combinations of lead byte and follow byte conform to the ISO/IEC standard.

This repertoire is also affixed to the ITU version of the specification as Annex A, although the ITU version does not reference it from the main text. It is described as a "unified superset" of the Latin-script character repertoires.^[2] It corresponds to the repertoire ofISO/IEC 10367 when the ASCII,Latin-1 (orLatin-5),Latin-2 andsupplementary Latin sets are used.^[5]

This system also differs from the Unicodecombining character system in that the diacritic code precedes the letter (as opposed to following it), making it more similar toANSEL.

A little anomaly is thatLatin Small Letter G with Cedilla is coded as if it were with an acute accent, that is, with a 0xC2 lead byte, since due to its descender interfering with a cedilla, the lowercase letter is usually with turned comma above:Ģ ģ.

In total 13 diacritical marks can be followed by the selected characters from the primary set:

Accent	Code	Second character	Result
Grave	0xC1	AEIOUaeiou	ÀÈÌÒÙàèìòù
Acute	0xC2	ACEILNORSUYZacegilnorsuyz	ÁĆÉÍĹŃÓŔŚÚÝŹáćéģíĺńóŕśúýź
Circumflex	0xC3	ACEGHIJOSUWYaceghijosuwy	ÂĈÊĜĤÎĴÔŜÛŴŶâĉêĝĥîĵôŝûŵŷ
Tilde	0xC4	AINOUainou	ÃĨÑÕŨãĩñõũ
Macron	0xC5	AEIOUaeiou	ĀĒĪŌŪāēīōū
Breve	0xC6	AGUagu	ĂĞŬăğŭ
Dot	0xC7	CEGIZcegz	ĊĖĠİŻċėġż
Umlaut or diæresis	0xC8	AEIOUYaeiouy	ÄËÏÖÜŸäëïöüÿ

Ring	0xCA	AUau	ÅŮåů
Cedilla	0xCB	CGKLNRSTcklnrst	ÇĢĶĻŅŖŞŢçķļņŗşţ

Double Acute	0xCD	OUou	ŐŰőű
Ogonek	0xCE	AEIUaeiu	ĄĘĮŲąęįų
Caron	0xCF	CDELNRSTZcdelnrstz	ČĎĚĽŇŘŠŤŽčďěľňřšťž

Codepage layout

[edit]

The reference tocombining characters in the U+0300—U+036F range for the codes in the range 0xC1—0xCF below is subject to the caveats mentioned above; they cannot simply be mapped to the codepoints listed. Also, Unicode distinguishes 0xE2 into uppercaseD with stroke and uppercaseEth, which usually look different for the lowercase letters (0xF2 and 0xF3).

The older 1988 edition of ITU T.51 defined two versions of the supplementary set, with the first version lacking thenon-breaking space,soft hyphen, not sign (¬) and broken bar (¦) present in the second version. The first version was defined as an extension of theT.61 supplementary set, and the second version as an extension of the first version.^[9] The current (1992) edition only includes the second version, deprecates certain characters, and updates the primary set to the current ISO-646-IRV (ASCII), although existing telematic services are permitted to retain the older behaviour.^[2]

ISO/IEC 6937 or ITU T.51 (Latin)
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
0x
1x
2x	SP	!	"	#	$/¤^[a]	%	&	'	(	)	*	+	,	-	.	/
3x	0	1	2	3	4	5	6	7	8	9	:	;	<	=	>	?
4x	@	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O
5x	P	Q	R	S	T	U	V	W	X	Y	Z	[	\	]	^	_
6x	`	a	b	c	d	e	f	g	h	i	j	k	l	m	n	o
7x	p	q	r	s	t	u	v	w	x	y	z	{	\|	}	~
8x
9x
Ax	NBSP	¡	¢	£	$^[b]	¥	#^[b]	§	¤	‘	“	«	←	↑	→	↓
Bx	°	±	²	³	×	µ	¶	·	÷	’	”	»	¼	½	¾	¿
Cx		◌̀	◌́	◌̂	◌̃	◌̄	◌̆	◌̇	◌̈		◌̊	◌̧	◌̲^[c]	◌̋	◌̨	◌̌
Dx	―	¹	®	©	™	♪	¬	¦					⅛	⅜	⅝	⅞
Ex	Ω	Æ	Đ/Ð	ª	Ħ	^[d]	Ĳ	Ŀ	Ł	Ø	Œ	º	Þ	Ŧ	Ŋ	ŉ
Fx	ĸ	æ	đ	ð	ħ	ı	ĳ	ŀ	ł	ø	œ	ß	þ	ŧ	ŋ	SHY

Differences fromT.61

Videotex version

[edit]

Main article:Videotex character set

The versions of the supplementary set used by the ITU T.101 standard forVideotex are based on the first supplementary set of the 1988 edition of T.51.

The default G2 set for Data Syntax 2 adds a΅ at 0xC0, for combination with codes from aGreek primary set.^[10]

The supplementary set for Data Syntax 3 adds non-spacing marks for a "vector overbar" andsolidus and severalsemigraphic characters.^[11]

ETS 300 706 version

[edit]

The ETS 300 706 standard forWorld System Teletext bases itsG2 set on ISO 6937.^[12] It is a superset of the supplementary set ofT.61, and a superset of the first supplementary set of the 1988 edition of T.51, but collides with the current edition of T.51 in certain positions. Diacritic codes in the ETS version are specified as being "for association with" characters from theG0 set in use,^[12] such asUS-ASCII orBS_viewdata. This version is shown in the chart below.

World System Teletext, Latin G2 Set (ETS 300 706:1997)^[12]
	0	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F
Ax	SP	¡	¢	£	$	¥	#	§	¤	‘	“	«	←	↑	→	↓
Bx	°	±	²	³	×	µ	¶	·	÷	’	”	»	¼	½	¾	¿
Cx		◌̀	◌́	◌̂	◌̃	◌̄	◌̆	◌̇	◌̈	̣◌̣	◌̊	◌̧	◌̲	◌̋	◌̨	◌̌
Dx	―	¹	®	©	™	♪	₠	‰	α				⅛	⅜	⅝	⅞
Ex	Ω	Æ	Đ/Ð	ª	Ħ		Ĳ	Ŀ	Ł	Ø	Œ	º	Þ	Ŧ	Ŋ	ŉ
Fx	ĸ	æ	đ	ð	ħ	ı	ĳ	ŀ	ł	ø	œ	ß	þ	ŧ	ŋ	■

Differences from T.51

Footnotes

[edit]

^Continued use for ¤ permitted for existing CCITT services only.^[2]
^^a ^bPermitted for existing CCITT services only, otherwise the ASCII representation should be used.^[2]
^Noted in the ITU version of the standard as having existing use forunderlined text, in combination with any other character including accented characters. Although the 1988 ITU edition includes this code,^[9] the 1992 ITU edition discourages sending this code in favour ofANSI escape sequences, although it does mention that it should be correctly interpreted when received by applicable systems.^[2] Previous editions of the ISO/IEC version of the standard also allowed combining this code with any character in the defined repertoire,^[7] whereas more recent revisions do not include this code.^[5]
^An early draft placedȷ in this position.

References

[edit]

^"T.51 : Latin based coded character sets for telematic services".www.itu.int.Archived from the original on 2019-10-08. Retrieved2019-11-14.
^^a ^b ^c ^d ^e ^f ^g ^hCCITT (1992-09-18).Latin based coded character sets for telematic services (1992 ed.). Recommendation T.51.
^ITU-T (1995-08-11).Recommendation T.51 (1992) Amendment 1.
^ITU (1985-08-01).Teletex Primary Set of Control Functions(PDF). ITSCJ/IPSJ.ISO-IR-106.
^^a ^b ^c ^d ^eISO/IEC JTC 1/SC 2/WG 3 (1998-04-15).WD 6937, Coded graphic character set for text communication - Latin alphabet(PDF). JTC1/SC2/N454.{{citation}}: CS1 maint: numeric names: authors list (link)
^ISO/IEC JTC 1/SC 2/WG 3 (1991-12-15).Supplementary Set of ISO/IEC 6937:1992(PDF). ITSCJ/IPSJ.ISO-IR-156.{{citation}}: CS1 maint: numeric names: authors list (link) (The left-hand side isUS-ASCII.)
^^a ^bISO/TC97/SC2/WG4 (1985-01-10).Supplementary Set of Latin Alphabetic and non-Alphabetic Graphic Characters(PDF). ITSCJ/IPSJ.ISO-IR-90.{{citation}}: CS1 maint: numeric names: authors list (link)
^Petersen, J. K. (2002-05-29).The Telecommunications Illustrated Dictionary. CRC Press. p. 888.ISBN 978-1-4200-4067-8.
^^a ^bCCITT (1988).Coded character sets for telematic services (1988 ed.). Recommendation T.51.
^CCITT (1988-11-01).Supplementary Set of Graphic Characters for Videotex(PDF). ITSCJ/IPSJ.ISO-IR-70.
^CCITT (1986-11-30).Supplementary Set of Graphic Characters for CCITT Recommendation T.101, Data Syntax III(PDF). ITSCJ/IPSJ.ISO-IR-128.
^^a ^b ^cETSI (1997). "15.6.3 Latin G2 Set".Enhanced Teletext specification (PDF)(PDF). p. 116. ETS 300 706.

External links

[edit]

ITU Recommendation T.51
ISO pages:ISO 6937-1:1983,ISO 6937-2:1983,ISO 6937-2:1983/Add 1:1989,ISO/IEC 6937:1994,ISO/IEC 6937:2001
WD 6937, Coded graphic character set for text communication - Latin alphabet (Revision of ISO/IEC 6937:1994) (ISO/IEC 6937:1994 draft)
ISO-IR-156 (ISO-IR registration of right-hand part)

v t e Character encodings
Early telecommunications	Telegraph code Needle Morse Non-Latin Wabun/Kana Chinese Cyrillic Baudot and Murray Fieldata ASCII ISO/IEC 646 BCDIC Teletex andVideotex/Teletext T.51/ISO/IEC 6937 ITU T.61 ITU T.101 World System Teletext background sets Transcode
ISO/IEC 8859	Approved parts -1 (Western Europe) -2 (Central Europe) -3 (Maltese/Esperanto) -4 (North Europe) -5 (Cyrillic) -6 (Arabic) -7 (Greek) -8 (Hebrew) -9 (Turkish) -10 (Nordic) -11 (Thai) -13 (Baltic) -14 (Celtic) -15 (New Western Europe) -16 (Romanian) Abandoned parts -12 (Devanagari) Proposed but not approved KOI-8 Cyrillic Sámi Adaptations Welsh Estonian Ukrainian Cyrillic
Bibliographic use	MARC-8 ANSEL CCCII/EACC ISO 5426 5426-2 5427 5428 6438 6862
National standards	ArmSCII Big5 BraSCII CNS 11643 DIN 66003 ELOT 927 GOST 10859 GB 2312 GB 12345 GB 12052 GB 18030 HKSCS ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KOI-7 KPS 9566 KS X 1001 KS X 1002 LST 1564 LST 1590-4 PASCII Shift JIS SI 960 TIS-620 TSCII VISCII VSCII YUSCII
ISO/IEC 2022	ISO/IEC 8859 ISO/IEC 10367 Extended Unix Code / EUC
Mac OSCode pages ("scripts")	Armenian Arabic Barents Cyrillic Celtic Central European Croatian Cyrillic Devanagari Farsi (Persian) Font X (Kermit) Gaelic Georgian Greek Gujarati Gurmukhi Hebrew Iceland Inuit Keyboard Latin (Kermit) Maltese/Esperanto Ogham Roman Romanian Sámi Turkish Turkic Cyrillic Ukrainian VT100
DOS code pages	437 737 850 858 861 862 863 864 865 866 867 868 869 899 904 932 936 942 949 950 951 1040 1043 1046 1098 1115 1116 1117 1118 1127 ABICOMP CS Indic CSX Indic CSX+ Indic CWI-2 Iran System Kamenický Mazovia MIK
IBM AIX code pages	895 896 912 915 921 922 1006 1008 1009 1010 1012 1013 1014 1015 1016 1017 1018 1019 1046 1133
Windows code pages	CER-GS 932 936 (GBK) 950 Extended Latin-8 1250 1251 1252 1253 1254 1255 1256 1257 1258 1270 Cyrillic + French Cyrillic + German Polytonic Greek
EBCDIC code pages	Japanese language in EBCDIC DKOI
DEC terminals (VTx)	Multinational (MCS) National Replacement (NRCS) French Canadian Swiss Spanish United Kingdom Dutch Finnish French Norwegian and Danish Swedish Norwegian and Danish (alternative) 8-bit Greek 8-bit Turkish SI 960 Hebrew Special Graphics Technical (TCS)
Platform specific	1052 1053 1054 1055 1058 Acorn RISC OS Amstrad CPC Apple II ATASCII Atari ST BICS Casio calculators CDC Compucolor 8001 Compucolor II CP/M+ DEC RADIX 50 DEC MCS/NRCS DG International Galaksija GEM GSM 03.38 HP Roman HP FOCAL HP RPL SQUOZE LICS LMBCS MSX NEC APC NeXT PETSCII PostScript Standard PostScript Latin 1 SAM Coupé Sega SC-3000 Sharp calculators Sharp MZ Sinclair QL Teletext TI calculators TRS-80 Ventura International WISCII XCCS ZX80 ZX81 ZX Spectrum
Unicode /ISO/IEC 10646	UTF-1 UTF-7 UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 DIN 91379 BOCU-1 CESU-8 SCSU TACE16 Comparison of Unicode encodings
TeX typesetting system	Cork LY1 OML OMS OT1
Miscellaneous code pages	ABICOMP ASMO 449 Digital encoding of APL symbols ISO-IR-68 ARIB STD-B24 Fieldata HZ IEC-P27-1 INIS 7-bit 8-bit ISO-IR-169 ISO 2033 KOI KOI8-R KOI8-RU KOI8-U Mojikyō SEASCII Stanford/ITS Symbol TRON Unified Hangul Code
Control character	Morse prosigns C0 and C1 control codes ISO/IEC 6429 JIS X 0211 Unicode control, format and separator characters Whitespace characters
Related topics	CCSID Character encodings in HTML Charset detection Han unification Hardware code page MICR code Mojibake Variable-length encoding
Character sets

v t e International Organization for Standardization (ISO) standards
List ofISO standards –ISO romanizations –IEC standards
1–9999	1 2 3 4 6 7 9 16 17 31 -0 -1 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 68-1 128 216 217 226 228 233 259 261 262 302 306 361 500 518 519 639 -1 -2 -3 -5 -6 646 657 668 690 704 732 764 838 843 860 898 965 999 1000 1004 1007 1073-1 1073-2 1155 1413 1538 1629 1745 1989 2014 2015 2022 2033 2047 2108 2145 2146 2240 2281 2533 2709 2711 2720 2788 2848 2852 2921 3029 3103 3166 -1 -2 -3 3297 3307 3601 3602 3864 3901 3950 3977 4031 4157 4165 4217 4909 5218 5426 5427 5428 5725 5775 5776 5800 5807 5964 6166 6344 6346 6373 6385 6425 6429 6438 6523 6709 6943 7001 7002 7010 7027 7064 7098 7185 7200 7498 -1 7637 7736 7810 7811 7812 7813 7816 7942 8000 8093 8178 8217 8373 8501-1 8571 8583 8601 8613 8632 8651 8652 8691 8805/8806 8807 8820-5 8859 -1 -2 -3 -4 -5 -6 -7 -8 -8-I -9 -10 -11 -12 -13 -14 -15 -16 8879 9000/9001 9036 9075 9126 9141 9227 9241 9293 9314 9362 9407 9496 9506 9529 9564 9592/9593 9594 9660 9797-1 9897 9899 9945 9984 9985 9995
10000–19999	10006 10007 10116 10118-3 10160 10161 10165 10179 10206 10218 10279 10303 -11 -21 -22 -28 -238 10383 10585 10589 10628 10646 10664 10746 10861 10957 10962 10967 11073 11170 11172 11179 11404 11544 11783 11784 11785 11801 11889 11898 11940 (-2) 11941 11941 (TR) 11992 12006 12052 12182 12207 12234-2 12620 13211 -1 -2 13216 13250 13399 13406-2 13450 13485 13490 13567 13568 13584 13616 13816 13818 14000 14031 14224 14289 14396 14443 14496 -2 -3 -6 -10 -11 -12 -14 -17 -20 14617 14644 14649 14651 14698 14764 14882 14971 15022 15189 15288 15291 15398 15408 15444 -3 -9 15445 15438 15504 15511 15686 15693 15706 -2 15707 15897 15919 15924 15926 15926 WIP 15930 15938 16023 16262 16355-1 16485 16612-2 16750 16949 (TS) 17024 17025 17100 17203 17369 17442 17506 17799 18004 18014 18181 18245 18629 18916 19005 19011 19092 -1 -2 19114 19115 19125 19136 19407 19439 19500 19501 19502 19503 19505 19506 19507 19508 19509 19510 19600 19752 19757 19770 19775-1 19794-5 19831
20000–29999	20000 20022 20121 20400 20802 20830 21000 21001 21047 21122 21500 21827 22000 22275 22300 22301 22395 22537 23000 23003 23008 23009 23090-3 23092 23094-1 23094-2 23270 23271 23360 23941 24517 24613 24617 24707 24728 25178 25964 26000 26262 26300 26324 27000 series 27000 27001 27002 27005 27006 27729 28000 29110 29148 29199-2 29500
30000+	30170 31000 32000 37001 38500 39075 40500 42010 45001 50001 55000 56000 80000
Category