Contents
The following sections discuss issues surrounding the structuringof text. Elements thatpresenttext (alignment elements, font elements, style sheets, etc.)are discussed elsewhere in the specification. For information aboutcharacters, please consult the section on the document character set.
Thedocument character set includes awide variety of white space characters. Many of these are typographicelements used in some applications to produce particular visualspacing effects. In HTML, only the following characters are definedaswhite space characters:
Line breaks are also white spacecharacters. Note that although 
 and 
 aredefined in[ISO10646] to unambiguously separate lines andparagraphs, respectively, these do not constitute line breaks in HTML,nor does this specification include them in the more general categoryof white space characters.
This specification does not indicate the behavior, rendering orotherwise, of space characters other than those explicitly identifiedhere as white space characters. For this reason, authors should useappropriate elements and styles to achieve visual formatting effectsthat involve white space, rather than space characters.
For all HTML elements exceptPRE,sequences of white space separate "words" (we use the term"word" here to mean "sequences of non-white space characters"). Whenformatting text, user agents should identify these words and lay themout according to the conventions of the particular written language(script) and target medium.
This layout may involve putting space between words (calledinter-word space), but conventions for inter-wordspace vary from script to script. For example, in Latin scripts,inter-word space is typically rendered as an ASCII space( ), while in Thai it is a zero-width word separator(​). In Japanese and Chinese, inter-word space is nottypically rendered at all.
Note that a sequence of white spaces between words in the sourcedocument may result in an entirely different rendered inter-wordspacing (except in the case of thePREelement). In particular, user agents shouldcollapse input white space sequences when producing outputinter-word space. This can and should be done even in the absence oflanguage information (from thelangattribute, the HTTP ThePRE element is used forpreformatted text, where white space issignificant. In order to avoid problems withSGML line breakrules and inconsistencies among extant implementations, authorsshould not rely on user agents to render white space immediately aftera start tag or immediately before an end tag. Thus, authors, and inparticularauthoringtools, should write: and not: Start tag:required, End tag:required Attributes defined elsewhere Phrase elements add structural information to text fragments.The usual meanings of phrase elements are following: EM andSTRONG areused to indicate emphasis. The other phrase elements haveparticular significance in technical documents. These examples illustratesome of the phrase elements: The presentation of phrase elements depends on the user agent.Generally, visual user agents presentEMtext in italics andSTRONG text in boldfont. Speech synthesizer user agents may change the synthesisparameters, such as volume, pitch and rate accordingly. TheABBR and The content of theABBR and Here are some sample uses ofABBR: Note that abbreviations and acronyms often have idiosyncraticpronunciations. For example, while "IRS" and "BBC" are typicallypronounced letter by letter, "NATO" and "UNESCO" are pronouncedphonetically. Still other abbreviated forms (e.g., "URI" and "SQL")are spelled out by some people and pronounced as words by otherpeople. When necessary, authors should use style sheets to specify thepronunciation of an abbreviated form. Start tag:required, End tag:required Attribute definitions Attributes defined elsewhere These two elements designatequoted text.BLOCKQUOTE is for longquotations (block-level content) andQ isintended for short quotations (inline content) that don't requireparagraph breaks. This example formats an excerpt from "The Two Towers", by J.R.R.Tolkien, as a blockquote. Visual user agents generally renderBLOCKQUOTE as an indented block. Visual user agents must ensure that the content of the User agents should render quotation marks in a language-sensitivemanner (see thelang attribute). Manylanguages adopt different quotation styles for outer and inner(nested) quotations, which should be respected by user-agents. The following example illustrates nested quotations withtheQ element. Since the language of both quotations is American English, user agentsshould render them appropriately, for example withsingle quote marks around the inner quotation and doublequote marks around the outer quotation: Note. We recommend that style sheetimplementations provide a mechanism for inserting quotation marksbefore and after a quotation delimited by However, as some authors have usedBLOCKQUOTE merely as a mechanism to indent text, in order topreserve the intention of the authors, user agents shouldnot insert quotation marks in the default style. The usage ofBLOCKQUOTE to indenttext isdeprecated in favorof style sheets. Start tag:required, End tag:required Attributes defined elsewhere Many scripts (e.g., French) require superscripts or subscriptsfor proper rendering. TheSUB and Authors traditionally divide their thoughts and arguments intosequences of paragraphs. The organization of information intoparagraphs is not affected by how the paragraphs are presented:paragraphs that are double-justified contain the same thoughts asthose that are left-justified. The HTML markup fordefining a paragraph isstraightforward: theP element defines aparagraph. The visual presentation of paragraphs is not so simple. A numberof issues, both stylistic and technical, must be addressed: We address these questions below.Paragraph alignment andfloating objects are discussed later in this document. Start tag:required, End tag:optional Attributes defined elsewhere TheP element represents a paragraph.It cannot containblock-levelelements (includingP itself). We discourage authors from using emptyPelements. User agents should ignore emptyPelements. Aline break is defined to be a carriage return(
), a line feed (
), or a carriage return/linefeed pair. All line breaks constitutewhitespace. For more information about SGML's specification of line breaks,please consult thenotes on line breaks in the appendix. Start tag:required, End tag:forbidden Attributes defined elsewhere TheBR element forcibly breaks (ends)the current line of text. For visual user agents, theclearattribute can be used to determine whether markup following theBR element flows around images and otherobjects floated to the left or right margin, or whether it startsafter the bottom of such objects. Further details are given in thesection onalignmentand floating objects. Authors are advised to use style sheetsto control text flow around floating images and other objects. With respect to bidirectional formatting, theBR element should behave the same way the [ISO10646] LINE SEPARATOR character behaves inthe bidirectional algorithm. Sometimes authors may want to prevent a line break fromoccurring between two words. The entity (  or ) acts as a space where user agents should not cause aline break. In HTML, there are two types of hyphens: the plain hyphen and thesoft hyphen. The plain hyphen should be interpreted by a user agent asjust another character. Thesoft hyphen tells the user agent where a line break canoccur. Those browsers that interpret soft hyphens must observe thefollowing semantics: If a line is broken at a soft hyphen, a hyphencharacter must be displayed at the end of the first line. If a lineis not broken at a soft hyphen, the user agent must not display a hyphencharacter. For operations such as searching and sorting, the softhyphen should always be ignored. In HTML, the plain hyphen is represented by the "-" character(- or -). The soft hyphen is represented by thecharacter entity reference ­ (­ or ­) Start tag:required, End tag:required Attribute definitions Attributes defined elsewhere ThePRE element tells visual user agentsthat the enclosed text is Non-visual user agents are not required to respect extrawhite space in the content of a For more information about SGML's specification of line breaks,please consult thenotes on line breaks in the appendix. The DTD fragment above indicates which elements may not appearwithin aPRE declaration. This is thesame as in HTML 3.2, and is intended to preserve constant linespacing and column alignment for text rendered in a fixed pitchfont. Authors are discouraged from altering this behavior throughstyle sheets. The following example shows a preformatted verse from Shelly'spoemTo a Skylark: Here is how this is typically rendered: The horizontal tab character Note. The following section is aninformative description of the behavior of some current visual useragents when formatting paragraphs. Style sheets allow better controlof paragraph formatting. How paragraphs are rendered visually depends on the user agent.Paragraphs are usually rendered flush left with a ragged rightmargin. Other defaults are appropriate for right-to-left scripts. HTML user agents have traditionally rendered paragraphs withwhite space before and after, e.g., This contrasts with the style used in novels which indents thefirst line of the paragraph and uses the regular line spacing betweenthe final line of the current paragraph and the first line of thenext, e.g., Following the precedent set by the NCSA Mosaic browser in 1993,user agents generally don't justify both margins, in part becauseit's hard to do this effectively without sophisticated hyphenationroutines. The advent of style sheets, and anti-aliased fonts withsubpixel positioning promises to offer richer choices to HTMLauthors than previously possible. Style sheets provide rich control over the size and style of afont, the margins, space before and after a paragraph, the firstline indent, justification and many other details. The user agent'sdefault style sheet rendersP elementsin a familiar form, as illustrated above. One could, in principle,override this to render paragraphs without the breaks thatconventionally distinguish successive paragraphs. In general, sincethis may confuse readers, we discourage this practice. By convention, visual HTML user agentswrap text lines to fit within the available margins. Wrappingalgorithms depend on the script being formatted. In Western scripts, for example, text should only be wrapped atwhite space. Early user agents incorrectly wrapped lines just afterthe start tag or just before the end tag of an element, which resultedin dangling punctuation. For example, consider this sentence: Wrapping the line just before the end tag of the This is an error since there was no white space at that pointin the markup. Start tag:required, End tag:required Attribute definitions Attributes defined elsewhere INS andDEL areused to markup sections of the document that have beeninserted or deleted with respect to a different version of adocument (e.g., in draft legislation where lawmakers need to view thechanges). These two elements are unusual for HTML in that they may serve aseither block-level or inline elements (but not both). They maycontain one or more words within a paragraph or contain one or moreblock-level elements such as paragraphs, lists and tables. This example could be from a bill to change the legislationfor how many deputies a County Sheriff can employ from 3 to 5. TheINS and ILLEGAL EXAMPLE: User agents should render inserted and deleted text in ways thatmake the change obvious. For instance, inserted text may appear ina special font, deleted text may not be shown at all or be shown asstruck-through or with special markings, etc. Both of the following examples correspond toNovember 5, 1994, 8:15:30 am, US Eastern Standard Time. Used withINS, this gives: The document "http://www.foo.org/mydoc/comments.html" wouldcontain comments about why information was inserted into thedocument. Authors may also make comments about inserted or deleted text bymeans of thetitle attribute for theINS andDELelements. User agents may present this information to the user(e.g., as a popup note). For example: <P>We offer free <A>technical support</A> for subscribers.</P>
<P>We offer free<A> technical support </A>for subscribers.</P>
9.2 Structured text
9.2.1 Phrase elements:EM,STRONG,DFN,CODE,SAMP,KBD,VAR,
<!ENTITY % phrase "EM |STRONG |DFN |CODE |SAMP |KBD |VAR |CITE |ABBR |ACRONYM" ><!ELEMENT (%fontstyle;|%phrase;) - - (%inline;)*><!ATTLIST (%fontstyle;|%phrase;)%attrs; --%coreattrs,%i18n,%events -- >
As <CITE>Harry S. Truman</CITE> said,<Q lang="en-us">The buck stops here.</Q>More information can be found in <CITE>[ISO-0000]</CITE>.Please refer to the following reference number in futurecorrespondence: <STRONG>1-234-55</STRONG>
<P> <ABBR title="World Wide Web">WWW</ABBR> <ABBR lang="fr" title="Société Nationale des Chemins de Fer"> SNCF </ABBR> <ABBR lang="es" title="Doña">Doña</ABBR> <ABBR title="Abbreviation">abbr.</ABBR>
9.2.2 Quotations: TheBLOCKQUOTE andQelements
<!ELEMENTBLOCKQUOTE - - (%block;|SCRIPT)+ -- long quotation --><!ATTLIST BLOCKQUOTE%attrs; --%coreattrs,%i18n,%events --cite%URI; #IMPLIED -- URI for source document or msg -- ><!ELEMENTQ - - (%inline;)* -- short inline quotation --><!ATTLIST Q%attrs; --%coreattrs,%i18n,%events --cite%URI; #IMPLIED -- URI for source document or msg -- >
<BLOCKQUOTE cite="http://www.mycom.com/tolkien/twotowers.html"><P>They went in single file, running like hounds on a strong scent,and an eager light was in their eyes. Nearly due west the broadswath of the marching Orcs tramped its ugly slot; the sweet grassof Rohan had been bruised and blackened as they passed.</P></BLOCKQUOTE>
Rendering quotations
John said, <Q lang="en-us">I saw Lucy at lunch, she says <Q lang="en-us">Mary wants youto get some ice cream on your way home.</Q> I think I will getsome at Ben and Jerry's, on Gloucester Road.</Q>
John said, "I saw Lucy at lunch, she told me 'Mary wants you to get some ice cream on your way home.' I think I will get some at Ben and Jerry's, on Gloucester Road."
9.2.3 Subscripts and superscripts: theSUB andSUP elements
<!ELEMENT (SUB|SUP) - - (%inline;)* -- subscript, superscript --><!ATTLIST (SUB|SUP)%attrs; --%coreattrs,%i18n,%events -- >
H<sub>2</sub>O E = mc<sup>2</sup> <SPAN lang="fr">M<sup>lle</sup> Dupont</SPAN>
9.3 Lines and Paragraphs
9.3.1 Paragraphs: theP element
9.3.2Controlling line breaks
Forcing aline break: the BRelement
Prohibiting a line break
9.3.3Hyphenation
9.3.4Preformatted text: ThePRE element
<!ENTITY % pre.exclusion "IMG|OBJECT|BIG|SMALL|SUB|SUP"><!ELEMENTPRE - - (%inline;)* -(%pre.exclusion;) -- preformatted text --><!ATTLIST PRE%attrs; --%coreattrs,%i18n,%events -- >
<PRE> Higher still and higher From the earth thou springest Like a cloud of fire; The blue deep thou wingest,And singing still dost soar, and soaring ever singest.</PRE>
Higher still and higher From the earth thou springest Like a cloud of fire; The blue deep thou wingest,And singing still dost soar, and soaring ever singest.
Thehorizontal tab character (decimal 9 in[ISO10646] and[ISO88591]) is usually interpreted by visual user agents as the smallestnon-zero number of spaces necessary to line characters up along tabstops that are every 8 characters. We strongly discourage usinghorizontal tabs in preformatted text since it is common practice, whenediting, to set the tab-spacing to other values, leading to misaligneddocuments.9.3.5Visual rendering of paragraphs
At the same time, there began to take form a system of numbering, the calendar, hieroglyphic writing, and a technically advanced art, all of which later influenced other peoples. Within the framework of this gradual evolution or cultural progress the Preclassic horizon has been divided into Lower, Middle and Upper periods, to which can be added a transitional or Protoclassic period with several features that would later distinguish the emerging civilizations of Mesoamerica.
At the same time, there began to take form a system of numbering, the calendar, hieroglyphic writing, and a technically advanced art, all of which later influenced other peoples. Within the framework of this gradual evolution or cultural progress the Preclassic horizon has been divided into Lower, Middle and Upper periods, to which can be added a transitional or Protoclassic period with several features that would later distinguish the emerging civilizations of Mesoamerica.
A statue of the <A href="cih78">Cihuateteus</A>, who are patron ...
A statue of the Cihuateteus , who are patron ...
9.4 Marking document changes: TheINS andDEL elements
<!-- INS/DEL are handled by inclusion on BODY --><!ELEMENT (INS|DEL) - - (%flow;)* -- inserted text, deleted text --><!ATTLIST (INS|DEL)%attrs; --%coreattrs,%i18n,%events --cite%URI; #IMPLIED -- info on reason for change --datetime%Datetime; #IMPLIED -- date and time of change -- >
<P> A Sheriff can employ <DEL>3</DEL><INS>5</INS> deputies.</P>
The following is not legal HTML.<P><INS><DIV>...block-level content...</DIV></INS></P>
1994-11-05T13:15:30Z 1994-11-05T08:15:30-05:00
<INS datetime="1994-11-05T08:15:30-05:00" cite="http://www.foo.org/mydoc/comments.html">Furthermore, the latest figures from the marketing departmentsuggest that such practice is on the rise.</INS>
<INS datetime="1994-11-05T08:15:30-05:00" title="Changed as a result of Steve B's comments in meeting.">Furthermore, the latest figures from the marketing departmentsuggest that such practice is on the rise.</INS>
[8]ページ先頭