![]() ISO/IEC 8859-1 code page layout | |
MIME / IANA | ISO-8859-1 |
---|---|
Alias(es) | iso-ir-100, csISOLatin1, latin1, l1, IBM819, CP819 |
Language(s) | English,various others |
Standard | ISO/IEC 8859 |
Classification | Extended ASCII,ISO/IEC 8859 |
Extends | US-ASCII |
Based on | DEC MCS |
Succeeded by | |
Other related encoding(s) | |
ISO/IEC 8859-1:1998,Information technology—8-bit single-byte coded graphiccharacter sets—Part 1: Latin alphabet No. 1, is part of theISO/IEC 8859 series ofASCII-based standardcharacter encodings, first edition published in 1987. ISO/IEC 8859-1 encodes what it refers to as "Latin alphabet no. 1", consisting of 191characters from theLatin script. This character-encoding scheme is used throughout theAmericas,Western Europe,Oceania, and much ofAfrica. It is the basis for some popular 8-bit character sets and the first two blocks of characters inUnicode.
As of December 2024[update], 1.1% of allweb sites useISO/IEC 8859-1.[1][2] It is the most declared single-byte character encoding, but as Web browsers and theHTML5 standard[3] interpret them as the supersetWindows-1252, these documents may include characters from that set. Some countries or languages show a higher usage than the global average, in 2024 Brazil according to website use, use is at 2.9%,[4] and in Germany at 2.5%.[5][6]
ISO-8859-1 was (according to the standard, at least) the default encoding of documents delivered viaHTTP with aMIME type beginning withtext/
, the default encoding of the values of certain descriptive HTTP headers, and defined the repertoire of characters allowed inHTML 3.2 documents. It is specified by many other standards.[example needed] In practice, the superset encoding Windows-1252 is the more likely effective default[citation needed] and it is increasingly common forUTF-8 to work[clarification needed] whether or not a standard specifies it.[citation needed]
ISO-8859-1 is theIANA preferred name for this standard when supplemented with theC0 and C1 control codes fromISO/IEC 6429. The following other aliases are registered:iso-ir-100,csISOLatin1,latin1,l1,IBM819,Code page 28591 a.k.a.Windows-28591 is used for it in Windows.[7] IBM calls itcode page 819 orCP819 (CCSID 819).[8][9][10][11]Oracle calls itWE8ISO8859P1.[12]
Each character is encoded as a single eight-bit code value. These code values can be used in almost any data interchange system to communicate in the following languages (while it may exclude correctquotation marks such as for many languages includingGerman andIcelandic):
ISO-8859-1 was commonly used[citation needed] for certain languages, even though it lacks characters used by these languages. In most cases, only a few letters are missing or they are rarely used, and they can be replaced with characters that are in ISO-8859-1 using some form oftypographic approximation. The following table lists such languages.
Language | Missing characters | Typical workaround | Supported by |
---|---|---|---|
Catalan | Ŀ, ŀ (deprecated) | L·, l· | |
Danish | Ǿ, ǿ (the accent is optional and ǿ is very rare) | Ø, ø or øe | |
Dutch | IJ, ij (debatable),j́ (in emphasized words like "blíj́f") | digraphs IJ, ij or ÿ; blíjf | |
Estonian,Finnish | Š, š,Ž, ž (only present in loanwords) | Sh, sh, Zh, zh | ISO-8859-15,Windows-1252 |
French | Œ, œ, and the very rareŸ | digraphs OE, oe; Y or Ý | ISO-8859-15,Windows-1252 |
German | ẞ (capital ß, used only in all capitals) | digraph SS or SZ | |
Hungarian | Ő, ő,Ű, ű | Ö, ö, Ü, ü Õ, õ,Û, û (the characters replaced in8859-2) | ISO-8859-2,Windows-1250 |
Irish (traditional orthography) | Ḃ, ḃ, Ċ, ċ, Ḋ, ḋ, Ḟ, ḟ, Ġ, ġ, Ṁ, ṁ, Ṗ, ṗ, Ṡ, ṡ, Ṫ, ṫ | Bh, bh, Ch, ch, Dh, dh, Fh, fh, Gh, gh, Mh, mh, Ph, ph, Sh, sh, Th, th | ISO-8859-14 |
Maltese | Ċ, ċ,Ġ, ġ,Ħ, ħ,Ż, ż | C, c, G, g, H, h, Z, z | ISO-8859-3 |
Welsh | Ẁ, ẁ,Ẃ, ẃ,Ŵ, ŵ,Ẅ, ẅ,Ỳ, ỳ,Ŷ, ŷ,Ÿ | W, w, Y, y, Ý, ý | ISO-8859-14 |
The letterÿ, which appears in French only very rarely, mainly in city names such asL'Haÿ-les-Roses and never at the beginning of words, is included only in lowercase form. The slot corresponding to its uppercase form is occupied by the lowercase letterß from the German language, which did not have anuppercase form at the time when the standard was created.
Typographical (6- or 9-shaped)quotation marks are missing, as are any baseline quotation marks used by some of the supported languages. Only« »
," "
, and' '
are included. Some fonts will display the spacing grave accent (0x60) and the apostrophe (0x27) as a matching pair of oriented single quotation marks (seeQuotation mark § Typewriters and early computers), but this is not considered part of the modern standard.
Only 3 superscript digits have been encoded:²
at 0xB2,³
at 0xB3, and¹
at 0xB9, lacking the superscript digit 0 and digits 4–9. Additionally, none of the subscript digits have been encoded. A workaround would be to use rich text formatting for the digits not covered by this standard.
Theeuro sign was first presented to the public on 12 December 1996.[13] Due to this character set being introduced in 1987, it does not include the euro sign. Later character sets similar to ISO/IEC 8859-1 include a euro sign, such asWindows-1252 andISO/IEC 8859-15.
ISO 8859-1 was based on theMultinational Character Set (MCS) used byDigital Equipment Corporation (DEC) in the popularVT220 terminal in 1983. It was developed within theEuropean Computer Manufacturers Association (ECMA), and published in March 1985 asECMA-94,[14] by which name it is still sometimes known. The second edition of ECMA-94 (June 1986)[15] also includedISO 8859-2,ISO 8859-3, andISO 8859-4 as part of the specification.
The original draft of ISO 8859-1 placed FrenchŒ andœ at code points 215 (0xD7) and 247 (0xF7), as in the MCS. However, the delegate from France, being neither a linguist nor a typographer, falsely stated that these are not independent French letters on their own, but mereligatures (likefi orfl), supported by the delegate team fromBull Publishing Company, who regularly did not print French withŒ/œ in their house style at the time. An anglophone delegate from Canada insisted on retainingŒ/œ but was rebuffed by the French delegate and the team from Bull. These code points were soon filled with × and ÷ under the suggestion of the German delegation. Support for French was further reduced when it was again falsely stated that the letterÿ is "not French", resulting in the absence of the capitalŸ. In fact, the letterÿ is found in a number of French proper names, and the capital letter has been used in dictionaries and encyclopedias.[16] These characters were added toISO/IEC 8859-15:1999.BraSCII matches the original draft.
In 1985,Commodore adopted ECMA-94 for its newAmigaOS operating system.[17] The Seikosha MP-1300AI impact dot-matrix printer, used with the Amiga 1000, included this encoding.[citation needed]
In 1990, the first version ofUnicode used the code points of ISO-8859-1 as the first 256 Unicode code points.
In 1992, theIANA registered the character mapISO_8859-1:1987, more commonly known by its preferredMIME name ofISO-8859-1 (note the extra hyphen over ISO 8859-1), a superset of ISO 8859-1, for use on theInternet. This map assigns theC0 and C1 control codes to the unassigned code values thus provides for 256 characters via every possible 8-bit value.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | ||||||||||||||||
1x | ||||||||||||||||
2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
8x | ||||||||||||||||
9x | ||||||||||||||||
Ax | NBSP | ¡ | ¢ | £ | ¤ | ¥ | ¦ | § | ¨ | © | ª | « | ¬ | SHY | ® | ¯ |
Bx | ° | ± | ² | ³ | ´ | µ | ¶ | · | ¸ | ¹ | º | » | ¼ | ½ | ¾ | ¿ |
Cx | À | Á | Â | Ã | Ä | Å | Æ | Ç | È | É | Ê | Ë | Ì | Í | Î | Ï |
Dx | Ð | Ñ | Ò | Ó | Ô | Õ | Ö | × | Ø | Ù | Ú | Û | Ü | Ý | Þ | ß |
Ex | à | á | â | ã | ä | å | æ | ç | è | é | ê | ë | ì | í | î | ï |
Fx | ð | ñ | ò | ó | ô | õ | ö | ÷ | ø | ù | ú | û | ü | ý | þ | ÿ |
Undefined Symbols and punctuation Undefined in the first release of ECMA-94 (1985).[14] In the original draft Œ was at 0xD7 and œ was at 0xF7. |
ISO/IEC 8859-15 was developed in 1999, as an update of ISO/IEC 8859-1. It provides some characters for French and Finnish text and theeuro sign, which are missing from ISO/IEC 8859-1. This required the removal of some infrequently used characters from ISO/IEC 8859-1, including fraction symbols and letter-free diacritics:¤
,¦
,¨
,´
,¸
,¼
,½
, and¾
. Ironically, three of the newly added characters (Œ
,œ
, andŸ
) had already been present inDEC's 1983Multinational Character Set (MCS), the predecessor to ISO/IEC 8859-1 (1987). Since their original code points were now reused for other purposes, the characters had to be reintroduced under different, less logical code points.
ISO-IR-204, a more minor modification (calledcode page 61235 by FreeDOS),[18] had been registered in 1998, altering ISO-8859-1 by replacing theuniversal currency sign (¤) with the euro sign[19] (the same substitution made by ISO-8859-15).
The popularWindows-1252 character set adds all the missing characters provided byISO/IEC 8859-15, plus a number of typographic symbols, by replacing the rarely used C1 controls in the range 128 to 159 (hex 80 to 9F). It is very common for Windows-1252 text to be mislabelled as ISO-8859-1. A common result was that all the quotes and apostrophes (produced by "smart quotes" in word-processing software) were replaced with question marks or boxes on non-Windows operating systems, making text difficult to read. Many Web browsers and e-mail clients will interpret ISO-8859-1 control codes as Windows-1252 characters, and that behavior was later standardized inHTML5.[20]
TheApple Macintosh computer introduced a character encoding calledMac Roman in 1984. It was meant to be suitable for Western Europeandesktop publishing. It is a superset of ASCII, and has most of the characters that are in ISO-8859-1 and all the extra characters from Windows-1252, but in a totally different arrangement. The few printable characters that are in ISO/IEC 8859-1, but not in this set, are often a source of trouble when editing text on Web sites using older Macintosh browsers, including the last version ofInternet Explorer for Mac.
DOS hascode page 850, which has all printable characters that ISO-8859-1 has, albeit in a totally different arrangement, plus the most widely usedgraphic characters fromcode page 437.
Between 1989[21] and 2015,Hewlett-Packard used another superset of ISO-8859-1 on many of their calculators.This proprietary character set was sometimes referred to simply as "ECMA-94" as well.[21] HP also hascode page 1053, which adds the medium shade (▒, U+2592) at 0x7F.[22]
SeveralEBCDIC code pages were purposely designed to have the same set of characters as ISO-8859-1, to allow easy conversion between them.
[…] Since 1982 the urgency of the need for an 8-bit single-byte coded character set was recognized in ECMA as well as in ANSI/X3L2 and numerous working papers were exchanged between the two groups. In February 1984 ECMA TC1 submitted to ISO/TC97/SC2 a proposal for such a coded character set. At its meeting of April 1984 SC decided to submit to TC97 a proposal for a new item of work for this topic. Technical discussions during and after this meeting led TC1 to adopt the coding scheme proposed by X3L2. Part 1 of Draft International Standard DTS 8859 is based on this joint ANSI/ECMA proposal. […] Adopted as an ECMA Standard by the General Assembly of Dec. 13–14, 1984. […]