Movatterモバイル変換


[0]ホーム

URL:


 previous  next  contents  elements  attributes  index

9 Text

Contents

  1. Whitespace
  2. Structured text
    1. Phrase elements:EM,STRONG,DFN,CODE,SAMP,KBD,VAR,CITE,ABBR, andACRONYM
    2. Quotations: TheBLOCKQUOTE andQelements
    3. Subscripts and superscripts: theSUB andSUP elements
  3. Lines and Paragraphs
    1. Paragraphs: theP element
    2. Controlling line breaks
    3. Hyphenation
    4. Preformatted text: ThePRE element
    5. Visual rendering of paragraphs
  4. Marking document changes: The INS and DEL elements

The following sections discuss issues surrounding the structuringof text. Elements thatpresenttext (alignment elements, font elements, style sheets, etc.)are discussed elsewhere in the specification. For information aboutcharacters, please consult the section on the document character set.

9.1Whitespace

Thedocument character set includes awide variety of white space characters. Many of these are typographicelements used in some applications to produce particular visualspacing effects. In HTML, only the following characters are definedaswhite space characters:

Line breaks are also white spacecharacters. Note that although 
 and 
 aredefined in[ISO10646] to unambiguously separate lines andparagraphs, respectively, these do not constitute line breaks in HTML,nor does this specification include them in the more general categoryof white space characters.

This specification does not indicate the behavior, rendering orotherwise, of space characters other than those explicitly identifiedhere as white space characters. For this reason, authors should useappropriate elements and styles to achieve visual formatting effectsthat involve white space, rather than space characters.

For all HTML elements exceptPRE,sequences of white space separate "words" (we use the term"word" here to mean "sequences of non-white space characters"). Whenformatting text, user agents should identify these words and lay themout according to the conventions of the particular written language(script) and target medium.

This layout may involve putting space between words (calledinter-word space), but conventions for inter-wordspace vary from script to script. For example, in Latin scripts,inter-word space is typically rendered as an ASCII space( ), while in Thai it is a zero-width word separator(​). In Japanese and Chinese, inter-word space is nottypically rendered at all.

Note that a sequence of white spaces between words in the sourcedocument may result in an entirely different rendered inter-wordspacing (except in the case of thePREelement). In particular, user agents shouldcollapse input white space sequences when producing outputinter-word space. This can and should be done even in the absence oflanguage information (from thelangattribute, the HTTP"Content-Language" header field (see[RFC2068], section 14.13), user agent settings,etc.).

ThePRE element is used forpreformatted text, where white space issignificant.

In order to avoid problems withSGML line breakrules and inconsistencies among extant implementations, authorsshould not rely on user agents to render white space immediately aftera start tag or immediately before an end tag. Thus, authors, and inparticularauthoringtools, should write:

  <P>We offer free <A>technical support</A> for subscribers.</P>

and not:

  <P>We offer free<A> technical support </A>for subscribers.</P>

9.2 Structured text

9.2.1 Phrase elements:EM,STRONG,DFN,CODE,SAMP,KBD,VAR,CITE,ABBR, andACRONYM

<!ENTITY % phrase "EM |STRONG |DFN |CODE |SAMP |KBD |VAR |CITE |ABBR |ACRONYM" ><!ELEMENT (%fontstyle;|%phrase;) - - (%inline;)*><!ATTLIST (%fontstyle;|%phrase;)%attrs;                              --%coreattrs,%i18n,%events --  >

Start tag:required, End tag:required

Attributes defined elsewhere

Phrase elements add structural information to text fragments.The usual meanings of phrase elements are following:

EM:
Indicates emphasis.
STRONG:
Indicates stronger emphasis.
CITE:
Contains a citation or a reference to other sources.
DFN:
Indicates that this is the defining instance of the enclosedterm.
CODE:
Designates a fragment of computer code.
SAMP:
Designates sample output from programs, scripts, etc.
KBD:
Indicates text to be entered by the user.
VAR:
Indicates an instance of a variable or program argument.
ABBR:
Indicates an abbreviated form (e.g., WWW, HTTP, URI, Mass., etc.).
ACRONYM:
Indicates an acronym (e.g., WAC, radar, etc.).

EM andSTRONG areused to indicate emphasis. The other phrase elements haveparticular significance in technical documents. These examples illustratesome of the phrase elements:

As <CITE>Harry S. Truman</CITE> said,<Q lang="en-us">The buck stops here.</Q>More information can be found in <CITE>[ISO-0000]</CITE>.Please refer to the following reference number in futurecorrespondence: <STRONG>1-234-55</STRONG>

The presentation of phrase elements depends on the user agent.Generally, visual user agents presentEMtext in italics andSTRONG text in boldfont. Speech synthesizer user agents may change the synthesisparameters, such as volume, pitch and rate accordingly.

TheABBR andACRONYM elements allow authors to clearlyindicate occurrences of abbreviations and acronyms. Western languagesmake extensive use of acronyms such as "GmbH", "NATO", and "F.B.I.",as well as abbreviations like "M.", "Inc.", "et al.", "etc.". BothChinese and Japanese use analogous abbreviation mechanisms, wherein along name is referred to subsequently with a subset of the Hancharacters from the original occurrence. Marking up these constructsprovides useful information to user agents and tools such as spellcheckers, speech synthesizers, translation systems and search-engineindexers.

The content of theABBR andACRONYM elements specifies the abbreviatedexpression itself, as it would normally appear in running text. Thetitle attribute of these elements may be used to provide the full orexpanded form of the expression.

Here are some sample uses ofABBR:

  <P>  <ABBR title="World Wide Web">WWW</ABBR>  <ABBR lang="fr"         title="Soci&eacute;t&eacute; Nationale des Chemins de Fer">     SNCF  </ABBR>  <ABBR lang="es" title="Do&ntilde;a">Do&ntilde;a</ABBR>  <ABBR title="Abbreviation">abbr.</ABBR>

Note that abbreviations and acronyms often have idiosyncraticpronunciations. For example, while "IRS" and "BBC" are typicallypronounced letter by letter, "NATO" and "UNESCO" are pronouncedphonetically. Still other abbreviated forms (e.g., "URI" and "SQL")are spelled out by some people and pronounced as words by otherpeople. When necessary, authors should use style sheets to specify thepronunciation of an abbreviated form.

9.2.2 Quotations: TheBLOCKQUOTE andQelements

<!ELEMENTBLOCKQUOTE - - (%block;|SCRIPT)+ -- long quotation --><!ATTLIST BLOCKQUOTE%attrs;                              --%coreattrs,%i18n,%events --cite%URI;          #IMPLIED  -- URI for source document or msg --  ><!ELEMENTQ - - (%inline;)*            -- short inline quotation --><!ATTLIST Q%attrs;                              --%coreattrs,%i18n,%events --cite%URI;          #IMPLIED  -- URI for source document or msg --  >

Start tag:required, End tag:required

Attribute definitions

cite =uri[CT]
The value of this attribute is a URI that designates a sourcedocument or message. This attribute is intended to give informationabout the source from which the quotation was borrowed.

Attributes defined elsewhere

These two elements designatequoted text.BLOCKQUOTE is for longquotations (block-level content) andQ isintended for short quotations (inline content) that don't requireparagraph breaks.

This example formats an excerpt from "The Two Towers", by J.R.R.Tolkien, as a blockquote.

<BLOCKQUOTE cite="http://www.mycom.com/tolkien/twotowers.html"><P>They went in single file, running like hounds on a strong scent,and an eager light was in their eyes. Nearly due west the broadswath of the marching Orcs tramped its ugly slot; the sweet grassof Rohan had been bruised and blackened as they passed.</P></BLOCKQUOTE>

Rendering quotations 

Visual user agents generally renderBLOCKQUOTE as an indented block.

Visual user agents must ensure that the content of theQ element is rendered with delimiting quotationmarks. Authors should not put quotation marks at the beginning and endof the content of aQ element.

User agents should render quotation marks in a language-sensitivemanner (see thelang attribute). Manylanguages adopt different quotation styles for outer and inner(nested) quotations, which should be respected by user-agents.

The following example illustrates nested quotations withtheQ element.

John said, <Q lang="en-us">I saw Lucy at lunch, she says <Q lang="en-us">Mary wants youto get some ice cream on your way home.</Q> I think I will getsome at Ben and Jerry's, on Gloucester Road.</Q>

Since the language of both quotations is American English, user agentsshould render them appropriately, for example withsingle quote marks around the inner quotation and doublequote marks around the outer quotation:

  John said, "I saw Lucy at lunch, she told me 'Mary wants you  to get some ice cream on your way home.' I think I will get some  at Ben and Jerry's, on Gloucester Road."

Note. We recommend that style sheetimplementations provide a mechanism for inserting quotation marksbefore and after a quotation delimited byBLOCKQUOTE in a manner appropriate to the currentlanguage context and the degree of nesting of quotations.

However, as some authors have usedBLOCKQUOTE merely as a mechanism to indent text, in order topreserve the intention of the authors, user agents shouldnot insert quotation marks in the default style.

The usage ofBLOCKQUOTE to indenttext isdeprecated in favorof style sheets.

9.2.3 Subscripts and superscripts: theSUB andSUP elements

<!ELEMENT (SUB|SUP) - - (%inline;)*    -- subscript, superscript --><!ATTLIST (SUB|SUP)%attrs;                              --%coreattrs,%i18n,%events --  >

Start tag:required, End tag:required

Attributes defined elsewhere

Many scripts (e.g., French) require superscripts or subscriptsfor proper rendering. TheSUB andSUP elements should be used to markup text inthese cases.

      H<sub>2</sub>O      E = mc<sup>2</sup>      <SPAN lang="fr">M<sup>lle</sup> Dupont</SPAN>

9.3 Lines and Paragraphs

Authors traditionally divide their thoughts and arguments intosequences of paragraphs. The organization of information intoparagraphs is not affected by how the paragraphs are presented:paragraphs that are double-justified contain the same thoughts asthose that are left-justified.

The HTML markup fordefining a paragraph isstraightforward: theP element defines aparagraph.

The visual presentation of paragraphs is not so simple. A numberof issues, both stylistic and technical, must be addressed:

We address these questions below.Paragraph alignment andfloating objects are discussed later in this document.

9.3.1 Paragraphs: theP element

<!ELEMENTP - O (%inline;)*            -- paragraph --><!ATTLIST P%attrs;                              --%coreattrs,%i18n,%events --  >

Start tag:required, End tag:optional

Attributes defined elsewhere

TheP element represents a paragraph.It cannot containblock-levelelements (includingP itself).

We discourage authors from using emptyPelements. User agents should ignore emptyPelements.

9.3.2Controlling line breaks

Aline break is defined to be a carriage return(&#x000D;), a line feed (&#x000A;), or a carriage return/linefeed pair. All line breaks constitutewhitespace.

For more information about SGML's specification of line breaks,please consult thenotes on line breaks in the appendix.

Forcing aline break: the BRelement 

<!ELEMENTBR - O EMPTY                 -- forced line break --><!ATTLIST BR%coreattrs;                          --id,class,style,title --  >

Start tag:required, End tag:forbidden

Attributes defined elsewhere

TheBR element forcibly breaks (ends)the current line of text.

For visual user agents, theclearattribute can be used to determine whether markup following theBR element flows around images and otherobjects floated to the left or right margin, or whether it startsafter the bottom of such objects. Further details are given in thesection onalignmentand floating objects. Authors are advised to use style sheetsto control text flow around floating images and other objects.

With respect to bidirectional formatting, theBR element should behave the same way the [ISO10646] LINE SEPARATOR character behaves inthe bidirectional algorithm.

Prohibiting a line break 

Sometimes authors may want to prevent a line break fromoccurring between two words. The &nbsp; entity (&#160; or&#xA0;) acts as a space where user agents should not cause aline break.

9.3.3Hyphenation

In HTML, there are two types of hyphens: the plain hyphen and thesoft hyphen. The plain hyphen should be interpreted by a user agent asjust another character. Thesoft hyphen tells the user agent where a line break canoccur.

Those browsers that interpret soft hyphens must observe thefollowing semantics: If a line is broken at a soft hyphen, a hyphencharacter must be displayed at the end of the first line. If a lineis not broken at a soft hyphen, the user agent must not display a hyphencharacter. For operations such as searching and sorting, the softhyphen should always be ignored.

In HTML, the plain hyphen is represented by the "-" character(&#45; or &#x2D;). The soft hyphen is represented by thecharacter entity reference &shy; (&#173; or &#xAD;)

9.3.4Preformatted text: ThePRE element

<!ENTITY % pre.exclusion "IMG|OBJECT|BIG|SMALL|SUB|SUP"><!ELEMENTPRE - - (%inline;)* -(%pre.exclusion;) -- preformatted text --><!ATTLIST PRE%attrs;                              --%coreattrs,%i18n,%events --  >

Start tag:required, End tag:required

Attribute definitions

width =number[CN]
Deprecated.This attribute provides a hint to visual user agents about thedesired width of the formatted block. The user agent can use thisinformation to select an appropriate font size or to indent thecontent appropriately. The desired width is expressed in number ofcharacters. This attribute is not widely supported currently.

Attributes defined elsewhere

ThePRE element tells visual user agentsthat the enclosed text is"preformatted". When handling preformatted text, visual useragents:

Non-visual user agents are not required to respect extrawhite space in the content of aPRE element.

For more information about SGML's specification of line breaks,please consult thenotes on line breaks in the appendix.

The DTD fragment above indicates which elements may not appearwithin aPRE declaration. This is thesame as in HTML 3.2, and is intended to preserve constant linespacing and column alignment for text rendered in a fixed pitchfont. Authors are discouraged from altering this behavior throughstyle sheets.

The following example shows a preformatted verse from Shelly'spoemTo a Skylark:

<PRE>       Higher still and higher         From the earth thou springest       Like a cloud of fire;         The blue deep thou wingest,And singing still dost soar, and soaring ever singest.</PRE>

Here is how this is typically rendered:

       Higher still and higher         From the earth thou springest       Like a cloud of fire;         The blue deep thou wingest,And singing still dost soar, and soaring ever singest.

The horizontal tab character
Thehorizontal tab character (decimal 9 in[ISO10646] and[ISO88591]) is usually interpreted by visual user agents as the smallestnon-zero number of spaces necessary to line characters up along tabstops that are every 8 characters. We strongly discourage usinghorizontal tabs in preformatted text since it is common practice, whenediting, to set the tab-spacing to other values, leading to misaligneddocuments.

9.3.5Visual rendering of paragraphs

Note. The following section is aninformative description of the behavior of some current visual useragents when formatting paragraphs. Style sheets allow better controlof paragraph formatting.

How paragraphs are rendered visually depends on the user agent.Paragraphs are usually rendered flush left with a ragged rightmargin. Other defaults are appropriate for right-to-left scripts.

HTML user agents have traditionally rendered paragraphs withwhite space before and after, e.g.,

  At the same time, there began to take form a system of numbering,  the calendar, hieroglyphic writing, and a technically advanced  art, all of which later influenced other peoples.  Within the framework of this gradual evolution or cultural  progress the Preclassic horizon has been divided into Lower,  Middle and Upper periods, to which can be added a transitional  or Protoclassic period with several features that would later  distinguish the emerging civilizations of Mesoamerica.

This contrasts with the style used in novels which indents thefirst line of the paragraph and uses the regular line spacing betweenthe final line of the current paragraph and the first line of thenext, e.g.,

     At the same time, there began to take form a system of  numbering, the calendar, hieroglyphic writing, and a technically  advanced art, all of which later influenced other peoples.     Within the framework of this gradual evolution or cultural  progress the Preclassic horizon has been divided into Lower,  Middle and Upper periods, to which can be added a transitional  or Protoclassic period with several features that would later  distinguish the emerging civilizations of Mesoamerica.

Following the precedent set by the NCSA Mosaic browser in 1993,user agents generally don't justify both margins, in part becauseit's hard to do this effectively without sophisticated hyphenationroutines. The advent of style sheets, and anti-aliased fonts withsubpixel positioning promises to offer richer choices to HTMLauthors than previously possible.

Style sheets provide rich control over the size and style of afont, the margins, space before and after a paragraph, the firstline indent, justification and many other details. The user agent'sdefault style sheet rendersP elementsin a familiar form, as illustrated above. One could, in principle,override this to render paragraphs without the breaks thatconventionally distinguish successive paragraphs. In general, sincethis may confuse readers, we discourage this practice.

By convention, visual HTML user agentswrap text lines to fit within the available margins. Wrappingalgorithms depend on the script being formatted.

In Western scripts, for example, text should only be wrapped atwhite space. Early user agents incorrectly wrapped lines just afterthe start tag or just before the end tag of an element, which resultedin dangling punctuation. For example, consider this sentence:

   A statue of the <A href="cih78">Cihuateteus</A>, who are patron ...

Wrapping the line just before the end tag of theA element causes the comma to be stranded at thebeginning of the next line:

  A statue of the Cihuateteus  , who are patron ...

This is an error since there was no white space at that pointin the markup.

9.4 Marking document changes: TheINS andDEL elements

<!-- INS/DEL are handled by inclusion on BODY --><!ELEMENT (INS|DEL) - - (%flow;)*      -- inserted text, deleted text --><!ATTLIST (INS|DEL)%attrs;                              --%coreattrs,%i18n,%events --cite%URI;          #IMPLIED  -- info on reason for change --datetime%Datetime;     #IMPLIED  -- date and time of change --  >

Start tag:required, End tag:required

Attribute definitions

cite =uri[CT]
The value of this attribute is a URI that designates a sourcedocument or message. This attribute is intended to point toinformation explaining why a document was changed.
datetime=datetime[CS]
The value of this attribute specifies thedate and time whenthe change was made.

Attributes defined elsewhere

INS andDEL areused to markup sections of the document that have beeninserted or deleted with respect to a different version of adocument (e.g., in draft legislation where lawmakers need to view thechanges).

These two elements are unusual for HTML in that they may serve aseither block-level or inline elements (but not both). They maycontain one or more words within a paragraph or contain one or moreblock-level elements such as paragraphs, lists and tables.

This example could be from a bill to change the legislationfor how many deputies a County Sheriff can employ from 3 to 5.

<P>  A Sheriff can employ <DEL>3</DEL><INS>5</INS> deputies.</P>

TheINS andDEL elements must not contain block-level contentwhen these elements behave as inline elements.

ILLEGAL EXAMPLE:
The following is not legal HTML.

<P><INS><DIV>...block-level content...</DIV></INS></P>

User agents should render inserted and deleted text in ways thatmake the change obvious. For instance, inserted text may appear ina special font, deleted text may not be shown at all or be shown asstruck-through or with special markings, etc.

Both of the following examples correspond toNovember 5, 1994, 8:15:30 am, US Eastern Standard Time.

     1994-11-05T13:15:30Z     1994-11-05T08:15:30-05:00

Used withINS, this gives:

<INS datetime="1994-11-05T08:15:30-05:00"        cite="http://www.foo.org/mydoc/comments.html">Furthermore, the latest figures from the marketing departmentsuggest that such practice is on the rise.</INS>

The document "http://www.foo.org/mydoc/comments.html" wouldcontain comments about why information was inserted into thedocument.

Authors may also make comments about inserted or deleted text bymeans of thetitle attribute for theINS andDELelements. User agents may present this information to the user(e.g., as a popup note). For example:

<INS datetime="1994-11-05T08:15:30-05:00"        title="Changed as a result of Steve B's comments in meeting.">Furthermore, the latest figures from the marketing departmentsuggest that such practice is on the rise.</INS>

previous  next  contents  elements  attributes  index

[8]ページ先頭

©2009-2025 Movatter.jp