Please refer to theerrata for this document, which may include some normative corrections. See alsotranslations.
This document is also available in these non-normative formats:Multi-part XHTML file,PostScript version,PDF version,ZIP archive, andGzip'd TAR archive.
Copyright ©2002W3C® (MIT,INRIA,Keio), All Rights Reserved. W3Cliability,trademark,document use andsoftware licensing rules apply.
This specification defines the Second Edition ofXHTML 1.0, a reformulation of HTML 4 as anXML 1.0 application, and threeDTDs corresponding to the ones defined by HTML 4. The semantics of the elements and their attributes aredefined in the W3C Recommendation for HTML 4. These semantics provide the foundation for future extensibility of XHTML. Compatibility with existing HTML user agents is possible by following asmall set of guidelines.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at theW3C.
This specification is aSuperseded Recommendation. A newerspecification exists that is recommended for new adoption in place ofthis specification. New implementations should follow thelatest version of the HTML specification.
This document is the second edition of the XHTML 1.0 specification incorporating the errata changes as of 1 August 2002. Changes between this version and the previous Recommendation areillustrated in adiff-marked version.
This second edition isnot a new version of XHTML 1.0 (first published 26 January 2000). The changes in this document reflect corrections applied as a result of comments submitted by thecommunity and as a result of ongoing work within the HTML Working Group. There are no substantive changes in this document - only the integration of various errata.
This document has been produced as part of theW3C HTML Activity.
At the time of publication, the working group believed there were zero patent disclosures relevant to this specification. A current list of patent disclosures relevant to this specification may befound on the Working Group'spatent disclosure page.
A list of current W3C Recommendations and other technical documentscan be found athttps://www.w3.org/TR/.
lang
andxml:lang
AttributesThis section is informative.
XHTML is a family of current and future document types and modules that reproduce, subset, and extend HTML 4 [HTML4]. XHTML familydocument types areXML based, and ultimately are designed to work in conjunction with XML-based user agents. The details of this family and itsevolution are discussed in more detail in [XHTMLMOD].
XHTML 1.0 (this specification) is the first document type in the XHTML family. It is a reformulation of the three HTML 4 document types as applications of XML 1.0 [XML]. It is intended to be used as a language for content that is both XML-conforming and, if some simpleguidelines arefollowed, operates in HTML 4 conforming user agents. Developers who migrate their content to XHTML 1.0 will realize the following benefits:
The XHTML family is the next step in the evolution of the Internet. By migrating to XHTML today, content developers can enter the XML world with all of its attendant benefits, while stillremaining confident in their content's backward and future compatibility.
HTML 4 [HTML4] is anSGML (Standard Generalized Markup Language) applicationconforming to International StandardISO 8879, and is widely regarded as the standard publishing language of the World WideWeb.
SGML is a language for describing markup languages, particularly those used in electronic document exchange, document management, and document publishing. HTML is an example of a language definedin SGML.
SGML has been around since the middle 1980's and has remained quite stable. Much of this stability stems from the fact that the language is both feature-rich and flexible. This flexibility,however, comes at a price, and that price is a level of complexity that has inhibited its adoption in a diversity of environments, including the World Wide Web.
HTML, as originally conceived, was to be a language for the exchange of scientific and other technical documents, suitable for use by non-document specialists. HTML addressed the problem of SGMLcomplexity by specifying a small set of structural and semantic tags suitable for authoring relatively simple documents. In addition to simplifying the document structure, HTML added support forhypertext. Multimedia capabilities were added later.
In a remarkably short space of time, HTML became wildly popular and rapidly outgrew its original purpose. Since HTML's inception, there has been rapid invention of new elements for use within HTML(as a standard) and for adapting HTML to vertical, highly specialized, markets. This plethora of new elements has led to interoperability problems for documents across different platforms.
XML™ is the shorthand name for Extensible Markup Language [XML].
XML was conceived as a means of regaining the power and flexibility of SGML without most of its complexity. Although a restricted form of SGML, XML nonetheless preserves most of SGML's power andrichness, and yet still retains all of SGML's commonly used features.
While retaining these beneficial features, XML removes many of the more complex features of SGML that make the authoring and design of suitable software both difficult and costly.
The benefits of migrating to XHTML 1.0 are described above. Some of the benefits of migrating to XHTML in general are:
This section is normative.
The following terms are used in this specification. These terms extend the definitions in [RFC2119] in ways based upon similar definitions inISO/IEC 9945-1:1990 [POSIX.1]:
This section is normative.
This version of XHTML provides a definition of strictly conforming XHTML 1.0 documents, which are restricted to elements and attributes from the XML and XHTML 1.0 namespaces. SeeSection 3.1.2 for information on using XHTML with other namespaces, for instance, to include metadata expressed inRDF within XHTMLdocuments.
A Strictly Conforming XHTML Document is an XML document that requires only the facilities described as mandatory in this specification. Such a document must meet all of the following criteria:
It must conform to the constraints expressed in one of the three DTDs found inDTDs and inAppendix B.
The root element of the document must behtml
.
The root element of the document must contain anxmlns
declaration for the XHTML namespace [XMLNS]. The namespace for XHTML isdefined to behttp://www.w3.org/1999/xhtml
. An example root element might look like:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
There must be a DOCTYPE declaration in the document prior to the root element. The public identifier included in the DOCTYPE declaration must reference one of the three DTDs found inDTDs using the respective Formal Public Identifier. The system identifier may be changed to reflect local system conventions.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
The DTD subset must not be used to override any parameter entities in the DTD.
An XML declaration is not required in all XML documents; however XHTML document authors are strongly encouraged to use XML declarations in all their documents. Such a declaration is required whenthe character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding was determined by a higher-level protocol. Here is an example of an XHTML document. In this example,the XML declaration is included.
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Virtual Library</title> </head> <body> <p>Moved to <a href="http://example.org/">example.org</a>.</p> </body></html>
The XHTML namespace may be used with other XML namespaces as per [XMLNS], although such documents are not strictly conforming XHTML 1.0documents as defined above. Work by W3C is addressing ways to specify conformance for documents involving multiple namespaces. For an example, see [XHTML+MathML].
The following example shows the way in which XHTML 1.0 could be used in conjunction with the MathML Recommendation:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>A Math Example</title> </head> <body> <p>The following is MathML markup:</p> <math xmlns="http://www.w3.org/1998/Math/MathML"> <apply> <log/> <logbase> <cn> 3 </cn> </logbase> <ci> x </ci> </apply> </math> </body></html>
The following example shows the way in which XHTML 1.0 markup could be incorporated into another XML namespace:
<?xml version="1.0" encoding="UTF-8"?><!-- initially, the default namespace is "books" --><book xmlns='urn:loc.gov:books' xmlns:isbn='urn:ISBN:0-395-36341-6' xml:lang="en" lang="en"> <title>Cheaper by the Dozen</title> <isbn:number>1568491379</isbn:number> <notes> <!-- make HTML the default namespace for a hypertext commentary --> <p xmlns='http://www.w3.org/1999/xhtml'> This is also available <a href="http://www.w3.org/">online</a>. </p> </notes></book>
A conforming user agent must meet all of the following criteria:
ID
(i.e. theid
attribute on most XHTML elements) as fragmentidentifiers.White space is handled according to the following rules. The following characters are defined in [XML] white space characters:
The XML processor normalizes different systems' line end codes into one single LINE FEED character, that is passed up to the application.
The user agent must use the definition from CSS for processing whitespace characters [CSS2].Note that the CSS2 recommendation does notexplicitly address the issue of whitespace handling in non-Latin character sets. This will be addressed in a future version of CSS, at which time this reference will be updated.
Note that in order to produce a Canonical XHTML document, the rules above must be applied and the rules in [XMLC14N] must also be applied tothe document.
This section is informative.
Due to the fact that XHTML is an XML application, certain practices that were perfectly legal in SGML-based HTML 4 [HTML4] must bechanged.
Well-formedness is a new concept introduced by [XML]. Essentially this means that all elements musteither have closing tags or be written in a special form (as described below), and that all the elements must nest properly.
Although overlapping is illegal in SGML, it is widely tolerated in existing browsers.
CORRECT: nested elements.
<p>here is an emphasized <em>paragraph</em>.</p>
INCORRECT: overlapping elements
<p>here is an emphasized <em>paragraph.</p></em>
XHTML documents must use lower case for all HTML element and attribute names. This difference is necessary because XML is case-sensitive e.g. <li> and <LI> are different tags.
In SGML-based HTML 4 certain elements were permitted to omit the end tag; with the elements that followed implying closure. XML does not allow end tags to be omitted. All elements other than thosedeclared in the DTD asEMPTY
must have an end tag. Elements that are declared in the DTD asEMPTY
can have an end tagor can use empty element shorthand (seeEmpty Elements).
CORRECT: terminated elements
<p>here is a paragraph.</p><p>here is another paragraph.</p>
INCORRECT: unterminated elements
<p>here is a paragraph.<p>here is another paragraph.
All attribute values must be quoted, even those which appear to be numeric.
CORRECT: quoted attribute values
<td rowspan="3">
INCORRECT: unquoted attribute values
<td rowspan=3>
XML does not support attribute minimization. Attribute-value pairs must be written in full. Attribute names such ascompact
andchecked
cannot occur in elements withouttheir value being specified.
CORRECT: unminimized attributes
<dl compact="compact">
INCORRECT: minimized attributes
<dl compact>
Empty elements must either have an end tag or the start tag must end with/>
. For instance,<br/>
or<hr></hr>
. SeeHTML Compatibility Guidelines for information on ways to ensure this is backward compatible with HTML 4 user agents.
CORRECT: terminated empty elements
<br/><hr/>
INCORRECT: unterminated empty elements
<br><hr>
When user agents process attributes, they do so according toSection 3.3.3 of [XML]:
In XHTML, the script and style elements are declared as having#PCDATA
content. As a result,<
and&
will be treated as the start of markup, andentities such as<
and&
will be recognized as entity references by the XML processor to<
and&
respectively. Wrapping thecontent of the script or style element within aCDATA
marked section avoids the expansion of these entities.
<script type="text/javascript"><![CDATA[... unescaped script content ...]]></script>
CDATA
sections are recognized by the XML processor and appear as nodes in the Document Object Model, seeSection 1.3 of the DOM Level 1 Recommendation [DOM].
An alternative is to use external script and style documents.
SGML gives the writer of a DTD the ability to exclude specific elements from being contained within an element. Such prohibitions (called "exclusions") are not possible in XML.
For example, the HTML 4 Strict DTD forbids the nesting of an 'a
' element within another 'a
' element to any descendant depth. It is not possible to spell out suchprohibitions in XML. Even though these prohibitions cannot be defined in the DTD, certain elements should not be nested. A summary of such elements and the elements that should not be nested in themis found in the normativeElement Prohibitions.
HTML 4 defined thename
attribute for the elementsa
,applet
,form
,frame
,iframe
,img
, andmap
. HTML 4 also introduced theid
attribute. Both of these attributes are designed to be used as fragment identifiers.
In XML, fragment identifiers are of typeID
, and there can only be a single attribute of typeID
per element. Therefore, in XHTML 1.0 theid
attribute isdefined to be of typeID
. In order to ensure that XHTML 1.0 documents are well-structured XML documents, XHTML 1.0 documents MUST use theid
attribute when defining fragmentidentifiers on the elements listed above. See theHTML Compatibility Guidelines for information on ensuring such anchors are backward compatible when servingXHTML documents as media typetext/html
.
Note that in XHTML 1.0, thename
attribute of these elements is formally deprecated, and will be removed in a subsequent version of XHTML.
HTML 4 and XHTML both have some attributes that have pre-defined and limited sets of values (e.g. thetype
attribute of theinput
element). In SGML and XML, these arecalledenumerated attributes. Under HTML 4, the interpretation of these values wascase-insensitive, so a value ofTEXT
was equivalent to a value oftext
.Under XML, the interpretation of these values iscase-sensitive, and in XHTML 1 all of these values are defined in lower-case.
SGML and XML both permit references to characters by using hexadecimal values. In SGML these references could be made using either &#Xnn; or &#xnn;. In XML documents, you must use thelower-case version (i.e. &#xnn;)
This section is normative.
Although there is no requirement for XHTML 1.0 documents to be compatible with existing user agents, in practice this is easy to accomplish. Guidelines for creating compatible documents can befound inAppendix C.
XHTML Documents which follow the guidelines set forth inAppendix C, "HTML Compatibility Guidelines" may be labeled with the Internet Media Type"text/html" [RFC2854], as they are compatible with most HTML browsers. Those documents, and any other document conforming to this specification,may also be labeled with the Internet Media Type "application/xhtml+xml" as defined in [RFC3236]. For further information on using media typeswith XHTML, see the informative note [XHTMLMIME].
This appendix is normative.
These DTDs and entity sets form a normative part of this specification. The complete set of DTD files together with an XML declaration and SGML Open Catalog is included in thezip file and thegzip'd tar file for this specification. Users looking for local copies of the DTDs to work with should download and use those archivesrather than using the specific DTDs referenced below.
These DTDs approximate the HTML 4 DTDs. The W3C recommends that you use the authoritative versions of these DTDs at their defined SYSTEM identifiers when validating content. If you need to usethese DTDs locally you should download one of the archives ofthis version. For completeness, the normative versions of the DTDs are included here:
The fileDTD/xhtml1-strict.dtd is a normative part of this specification. The annotated contents of this file are available in thisseparate section for completeness.
The fileDTD/xhtml1-transitional.dtd is a normative part of this specification. The annotated contents of this file are available in thisseparate section for completeness.
The fileDTD/xhtml1-frameset.dtd is a normative part of this specification. The annotated contents of this file are available in thisseparate section for completeness.
The XHTML entity sets are the same as for HTML 4, but have been modified to be valid XML 1.0 entity declarations. Note the entity for the Euro currency sign (€
or€
or€
) is defined as part of the special characters.
The fileDTD/xhtml-lat1.ent is a normative part of this specification. The annotated contents of this file are available in thisseparate section for completeness.
The fileDTD/xhtml-special.ent is a normative part of this specification. The annotated contents of this file are available in thisseparate section for completeness.
The fileDTD/xhtml-symbol.ent is a normative part of this specification. The annotated contents of this file are available in thisseparate section for completeness.
This appendix is normative.
The following elements have prohibitions on which elements they can contain (seeSGML Exclusions). This prohibition applies to all depths of nesting, i.e. itcontains all the descendant elements.
a
a
elements.pre
img
,object
,big
,small
,sub
, orsup
elements.button
input
,select
,textarea
,label
,button
,form
,fieldset
,iframe
orisindex
elements.label
label
elements.form
form
elements.This appendix is informative.
This appendix summarizes design guidelines for authors who wish their XHTML documents to render on existing HTML user agents.Note that this recommendation does not define how HTML conforminguser agents should process HTML documents. Nor does it define the meaning of the Internet Media Typetext/html
. For these definitions, see [HTML4] and [RFC2854] respectively.
Be aware that processing instructions are rendered on some user agents. Also, some user agents interpret the XML declaration to mean that the document is unrecognized XML rather than HTML, andtherefore may not render the document as expected. For compatibility with these types of legacy browsers, you may want to avoid using processing instructions and XML declarations. Remember, however,that when the XML declaration is not included in a document, the document can only use the default character encodings UTF-8 or UTF-16.
Include a space before the trailing/
and>
of empty elements, e.g.<br />
,<hr />
and<img src="karen.jpg" alt="Karen" />
. Also, use the minimized tag syntax for empty elements, e.g.<br />
, as thealternative syntax<br></br>
allowed by XML gives uncertain results in many existing user agents.
Given an empty instance of an element whose content model is notEMPTY
(for example, an empty title or paragraph) do not use the minimized form (e.g. use<p> </p>
and not<p />
).
Use external style sheets if your style sheet uses<
or&
or]]>
or--
. Use external scripts if your script uses<
or&
or]]>
or--
. Note that XML parsers are permitted to silently remove the contents of comments. Therefore, the historical practice of "hiding" scriptsand style sheets within "comments" to make the documents backward compatible is likely to not work as expected in XML-based user agents.
Avoid line breaks and multiple white space characters within attribute values. These are handled inconsistently by user agents.
Don't include more than oneisindex
element in the documenthead
. Theisindex
element is deprecated in favor of theinput
element.
lang
andxml:lang
AttributesUse both thelang
andxml:lang
attributes when specifying the language of an element. The value of thexml:lang
attribute takes precedence.
In XML,URI-references [RFC2396] that end with fragment identifiers of the form"#foo"
do not refer to elements with an attributename="foo"
; rather, they refer to elements with an attribute defined to be of typeID
, e.g., theid
attribute in HTML 4. Many existing HTML clients don't support the use ofID
-type attributes in this way, so identical values may be supplied for both of these attributes to ensuremaximum forward and backward compatibility (e.g.,<a name="foo">...</a>
).
Further, since the set of legal values for attributes of typeID
is much smaller than for those of typeCDATA
, the type of thename
attribute has beenchanged toNMTOKEN
. This attribute is constrained such that it can only have the same values as typeID
, or as theName
production in XML 1.0 Section 2.3,production 5. Unfortunately, this constraint cannot be expressed in the XHTML 1.0 DTDs. Because of this change, care must be taken when converting existing HTML documents. The values of theseattributes must be unique within the document, valid, and any references to these fragment identifiers (both internal and external) must be updated should the values be changed during conversion.
Note that the collection of legal values in XML 1.0 Section 2.3, production 5 is much larger than that permitted to be used in theID
andNAME
types defined in HTML 4.When defining fragment identifiers to be backward-compatible, only strings matching the pattern[A-Za-z][A-Za-z0-9:_.-]*
should be used. SeeSection 6.2 of [HTML4] for more information.
Finally, note that XHTML 1.0 has deprecated thename
attribute of thea
,applet
,form
,frame
,iframe
,img
, andmap
elements, and it will be removed from XHTML in subsequent versions.
Historically, the character encoding of an HTML document is either specified by a web server via the charset parameter of the HTTP Content-Type header, or via ameta
element in thedocument itself. In an XML document, the character encoding of the document is specified on the XML declaration (e.g.,<?xml version="1.0" encoding="EUC-JP"?>
).In order to portably present documents with specific character encodings, the best approach is to ensure that the web server provides the correct headers. If this is not possible, a document thatwants to set its character encoding explicitly must include both the XML declaration an encoding declaration and ameta
http-equiv statement (e.g.,<metahttp-equiv="Content-type" content="text/html; charset=EUC-JP" />
). In XHTML-conforming user agents, the value of the encoding declaration of the XML declaration takes precedence.
Note: be aware that if a document must include the character encoding declaration in a meta http-equiv statement, that document may always be interpreted by HTTP servers and/or user agents asbeing of the internet media type defined in that statement. If a document is to be served as multiple media types, the HTTP server must be used to set the encoding of the document.
Some HTML user agents are unable to interpret boolean attributes when these appear in their full (non-minimized) form, as required by XML 1.0. Note this problem doesn't affect user agentscompliant with HTML 4. The following attributes are involved:compact
,nowrap
,ismap
,declare
,noshade
,checked
,disabled
,readonly
,multiple
,selected
,noresize
,defer
.
The Document Object Model level 1 Recommendation [DOM] defines document object model interfaces for XML and HTML 4. The HTML 4 document objectmodel specifies that HTML element and attribute names are returned in upper-case. The XML document object model specifies that element and attribute names are returned in the case they are specified.In XHTML 1.0, elements and attributes are specified in lower-case. This apparent difference can be addressed in two ways:
text/html
via theDOM can use the HTML DOM, and can rely upon elementand attribute names being returned in upper-case from those interfaces.text/xml
,application/xml
, orapplication/xhtml+xml
can also use the XML DOM.Elements and attributes will be returned in lower-case. Also, some XHTML elements may or may not appear in the object tree because they are optional in the content model (e.g. thetbody
element withintable
). This occurs because in HTML 4 some elements were permitted to be minimized such that their start and end tags are both omitted (an SGML feature). This is notpossible in XML. Rather than require document authors to insert extraneous elements, XHTML has made the elements optional. User agents need to adapt to this accordingly. For further information onthis topic, see [DOM2]In both SGML and XML, the ampersand character ("&") declares the beginning of an entity reference (e.g., ® for the registered trademark symbol "®"). Unfortunately, many HTML useragents have silently ignored incorrect usage of the ampersand character in HTML documents - treating ampersands that do not look like entity references as literal ampersands. XML-based user agentswill not tolerate this incorrect usage, and any document that uses an ampersand incorrectly will not be "valid", and consequently will not conform to this specification. In order to ensure thatdocuments are compatible with historical HTML user agents and XML-based user agents, ampersands used in a document that are to be treated as literal characters must be expressed themselves as anentity reference (e.g. "&
"). For example, when thehref
attribute of thea
element refers to a CGI script that takes parameters, it must be expressed ashttp://my.site.dom/cgi-bin/myscript.pl?class=guest&name=user
rather than ashttp://my.site.dom/cgi-bin/myscript.pl?class=guest&name=user
.
The Cascading Style Sheets level 2 Recommendation [CSS2] defines style properties which are applied to the parse tree of the HTML or XMLdocuments. Differences in parsing will produce different visual or aural results, depending on the selectors used. The following hints will reduce this effect for documents which are served withoutmodification as both media types:
In HTML 4 and XHTML, thestyle
element can be used to define document-internal style rules. In XML, an XML stylesheet declaration is used to define style rules. In order to becompatible with this convention,style
elements should have their fragment identifier set using theid
attribute, and an XML stylesheet declaration should reference thisfragment. For example:
<?xml-stylesheet href="http://www.w3.org/StyleSheets/TR/W3C-REC.css" type="text/css"?><?xml-stylesheet href="#internalStyle" type="text/css"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><title>An internal stylesheet example</title><style type="text/css"> code { color: green; font-family: monospace; font-weight: bold; }</style></head><body><p> This is text that uses our <code>internal stylesheet</code>.</p></body></html>
Some characters that are legal in HTML documents, are illegal in XML document. For example, in HTML, the Formfeed character (U+000C) is treated as white space, in XHTML, due to XML's definition ofcharacters, it is illegal.
The named character reference'
(the apostrophe, U+0027) was introduced in XML 1.0 but does not appear in HTML. Authors should therefore use'
instead of'
to work as expected in HTML 4 user agents.
This appendix is informative.
This specification was written with the participation of the members of the W3C HTML Working Group.
At publication of the second edition, the membership was:
At publication of the first edition, the membership was:
This appendix is informative.