Movatterモバイル変換


[0]ホーム

URL:


 previous next  contents  elements  attributes  index

9 Text

Contents

  1. White space
  2. Structured text
    1. Phrase elements:EM,STRONG,DFN,CODE,SAMP,KBD,VAR,CITE,ABBR, andACRONYM
    2. Quotations: TheBLOCKQUOTE andQ elements
    3. Subscripts and superscripts: theSUB andSUP elements
  3. Lines and Paragraphs
    1. Paragraphs: theP element
    2. Controlling line breaks
    3. Hyphenation
    4. Preformatted text: ThePRE element
    5. Visual rendering of paragraphs
  4. Marking document changes: The INS and DELelements

The following sections discuss issues surrounding the structuring of text.Elements thatpresent text (alignmentelements, font elements, style sheets, etc.) are discussed elsewhere in thespecification. For information about characters, please consult the section onthedocument character set.

9.1White space

Thedocument character set includes a widevariety of white space characters. Many of these are typographic elements usedin some applications to produce particular visual spacing effects. In HTML,only the following characters are defined aswhite spacecharacters:

Line breaks are also white space characters. Notethat although 
 and 
 are defined in[ISO10646] tounambiguously separate lines and paragraphs, respectively, these do notconstitute line breaks in HTML, nor does this specification include them in themore general category of white space characters.

This specification does not indicate the behavior, rendering or otherwise,of space characters other than those explicitly identified here as white spacecharacters. For this reason, authors should use appropriate elements and stylesto achieve visual formatting effects that involve white space, rather thanspace characters.

For all HTML elements exceptPRE, sequences of white space separate "words"(we use the term "word" here to mean "sequences of non-white spacecharacters"). When formatting text, user agents should identify these words andlay them out according to the conventions of the particular written language(script) and target medium.

This layout may involve putting space between words (calledinter-word space), but conventions for inter-word space varyfrom script to script. For example, in Latin scripts, inter-word space istypically rendered as an ASCII space ( ), while in Thai it is azero-width word separator (​). In Japanese and Chinese, inter-wordspace is not typically rendered at all.

Note that a sequence of white spaces between words in the source documentmay result in an entirely different rendered inter-word spacing (except in thecase of thePRE element). In particular, user agents shouldcollapse input white space sequences when producing outputinter-word space. This can and should be done even in the absence of languageinformation (from thelang attribute, the HTTP"Content-Language" header field (see[RFC2616], section14.12), user agent settings, etc.).

ThePRE element is used forpreformattedtext, where white space is significant.

In order to avoid problems withSGML line break rules andinconsistencies among extant implementations, authors should not rely on useragents to render white space immediately after a start tag or immediatelybefore an end tag. Thus, authors, and in particularauthoring tools,should write:

  <P>We offer free <A>technical support</A> for subscribers.</P>

and not:

  <P>We offer free<A> technical support </A>for subscribers.</P>

9.2 Structured text

9.2.1 Phrase elements:EM,STRONG,DFN,CODE,SAMP,KBD,VAR,CITE,ABBR, andACRONYM

<!ENTITY % phrase "EM |STRONG |DFN |CODE |SAMP |KBD |VAR |CITE |ABBR |ACRONYM" ><!ELEMENT (%fontstyle;|%phrase;) - - (%inline;)*><!ATTLIST (%fontstyle;|%phrase;)%attrs;                              --%coreattrs,%i18n,%events --  >

Start tag:required, End tag:required

Attributes defined elsewhere

Phrase elements add structural information to text fragments. The usualmeanings of phrase elements are following:

EM:
Indicates emphasis.
STRONG:
Indicates stronger emphasis.
CITE:
Contains a citation or a reference to other sources.
DFN:
Indicates that this is the defining instance of the enclosed term.
CODE:
Designates a fragment of computer code.
SAMP:
Designates sample output from programs, scripts, etc.
KBD:
Indicates text to be entered by the user.
VAR:
Indicates an instance of a variable or program argument.
ABBR:
Indicates an abbreviated form (e.g., WWW, HTTP, URI, Mass., etc.).
ACRONYM:
Indicates an acronym (e.g., WAC, radar, etc.).

EMandSTRONG are used to indicate emphasis. The other phrase elements haveparticular significance in technical documents. These examples illustrate someof the phrase elements:

As <CITE>Harry S. Truman</CITE> said,<Q lang="en-us">The buck stops here.</Q>More information can be found in <CITE>[ISO-0000]</CITE>.Please refer to the following reference number in futurecorrespondence: <STRONG>1-234-55</STRONG>

The presentation of phrase elements depends on the user agent. Generally,visual user agents presentEM text in italics andSTRONG text in bold font. Speechsynthesizer user agents may change the synthesis parameters, such as volume,pitch and rate accordingly.

TheABBR andACRONYM elements allow authors to clearly indicateoccurrences ofabbreviations and acronyms.Western languages make extensive use of acronyms such as "GmbH", "NATO", and"F.B.I.", as well as abbreviations like "M.", "Inc.", "et al.", "etc.". BothChinese and Japanese use analogous abbreviation mechanisms, wherein a long nameis referred to subsequently with a subset of the Han characters from theoriginal occurrence. Marking up these constructs provides useful information touser agents and tools such as spell checkers, speech synthesizers, translationsystems and search-engine indexers.

The content of theABBR andACRONYM elements specifies the abbreviatedexpression itself, as it would normally appear in running text. The titleattribute of these elements may be used to provide the full or expanded form ofthe expression.

Here are some sample uses ofABBR:

  <P>  <ABBR title="World Wide Web">WWW</ABBR>  <ABBR lang="fr"         title="Soci&eacute;t&eacute; Nationale des Chemins de Fer">     SNCF  </ABBR>  <ABBR lang="es" title="Do&ntilde;a">Do&ntilde;a</ABBR>  <ABBR title="Abbreviation">abbr.</ABBR>

Note that abbreviations and acronyms often have idiosyncraticpronunciations. For example, while "IRS" and "BBC" are typically pronouncedletter by letter, "NATO" and "UNESCO" are pronounced phonetically. Still otherabbreviated forms (e.g., "URI" and "SQL") are spelled out by some people andpronounced as words by other people. When necessary, authors should use stylesheets to specify the pronunciation of an abbreviated form.

9.2.2 Quotations: TheBLOCKQUOTE andQ elements

<!ELEMENTBLOCKQUOTE - - (%block;|SCRIPT)+ -- long quotation --><!ATTLIST BLOCKQUOTE%attrs;                              --%coreattrs,%i18n,%events --cite%URI;          #IMPLIED  -- URI for source document or msg --  ><!ELEMENTQ - - (%inline;)*            -- short inline quotation --><!ATTLIST Q%attrs;                              --%coreattrs,%i18n,%events --cite%URI;          #IMPLIED  -- URI for source document or msg --  >

Start tag:required, End tag:required

Attribute definitions

cite =uri[CT]
The value of this attribute is a URI that designates a source document ormessage. This attribute is intended to give information about the source fromwhich the quotation was borrowed.

Attributes defined elsewhere

These two elements designatequoted text.BLOCKQUOTE is for long quotations (block-level content) andQ is intendedfor short quotations (inline content) that don't require paragraph breaks.

This example formats an excerpt from "The Two Towers", by J.R.R. Tolkien, asa blockquote.

<BLOCKQUOTE cite="http://www.mycom.com/tolkien/twotowers.html"><P>They went in single file, running like hounds on a strong scent,and an eager light was in their eyes. Nearly due west the broadswath of the marching Orcs tramped its ugly slot; the sweet grassof Rohan had been bruised and blackened as they passed.</P></BLOCKQUOTE>

Rendering quotations 

Visual user agents generally renderBLOCKQUOTE as an indentedblock.

Visual user agents must ensure that the content of theQ element isrendered with delimiting quotation marks. Authors should not put quotationmarks at the beginning and end of the content of aQ element.

User agents should render quotation marks in a language-sensitive manner(see thelang attribute). Many languages adopt different quotation styles forouter and inner (nested) quotations, which should be respected byuser-agents.

The following example illustrates nested quotations with theQ element.

John said, <Q lang="en-us">I saw Lucy at lunch, she told me<Q lang="en-us">Mary wants youto get some ice cream on your way home.</Q> I think I will getsome at Ben and Jerry's, on Gloucester Road.</Q>

Since the language of both quotations is American English, user agentsshould render them appropriately, for example with single quote marks aroundthe inner quotation and double quote marks around the outer quotation:

  John said, "I saw Lucy at lunch, she told me 'Mary wants you  to get some ice cream on your way home.' I think I will get some  at Ben and Jerry's, on Gloucester Road."

Note. We recommend that style sheet implementationsprovide a mechanism for inserting quotation marks before and after a quotationdelimited byBLOCKQUOTE in a manner appropriate to the current languagecontext and the degree of nesting of quotations.

However, as some authors have usedBLOCKQUOTE merely as a mechanismto indent text, in order to preserve the intention of the authors, user agentsshouldnot insert quotation marks in the defaultstyle.

The usage ofBLOCKQUOTE to indent text isdeprecated in favor of style sheets.

9.2.3 Subscripts and superscripts: theSUB andSUP elements

<!ELEMENT (SUB|SUP) - - (%inline;)*    -- subscript, superscript --><!ATTLIST (SUB|SUP)%attrs;                              --%coreattrs,%i18n,%events --  >

Start tag:required, End tag:required

Attributes defined elsewhere

Many scripts (e.g., French) require superscripts or subscripts for properrendering. TheSUB andSUP elements should be used to markup text in thesecases.

      H<sub>2</sub>O      E = mc<sup>2</sup>      <SPAN lang="fr">M<sup>lle</sup> Dupont</SPAN>

9.3 Lines and Paragraphs

Authors traditionally divide their thoughts and arguments into sequences ofparagraphs. The organization of information into paragraphs is not affected byhow the paragraphs are presented: paragraphs that are double-justified containthe same thoughts as those that are left-justified.

The HTML markup fordefining a paragraph is straightforward: theP elementdefines a paragraph.

The visual presentation of paragraphs is not so simple. A number of issues,both stylistic and technical, must be addressed:

We address these questions below.Paragraph alignment and floatingobjects are discussed later in this document.

9.3.1 Paragraphs: theP element

<!ELEMENTP - O (%inline;)*            -- paragraph --><!ATTLIST P%attrs;                              --%coreattrs,%i18n,%events --  >

Start tag:required, End tag:optional

Attributes defined elsewhere

TheP element represents a paragraph. It cannot containblock-level elements (includingP itself).

We discourage authors from using emptyP elements. User agents should ignoreemptyP elements.

9.3.2Controlling linebreaks

Aline break is defined to be a carriage return (&#x000D;),a line feed (&#x000A;), or a carriage return/line feed pair. All linebreaks constitutewhite space.

For more information about SGML's specification of line breaks, pleaseconsult thenotes on linebreaks in the appendix.

Forcing a line break: theBR element 

<!ELEMENTBR - O EMPTY                 -- forced line break --><!ATTLIST BR%coreattrs;                          --id,class,style,title --  >

Start tag:required, End tag:forbidden

Attributes defined elsewhere

TheBR element forcibly breaks (ends) the current line of text.

For visual user agents, theclear attribute can be used todetermine whether markup following theBR element flows around images andother objects floated to the left or right margin, or whether it starts afterthe bottom of such objects. Further details are given in the section onalignment and floating objects.Authors are advised to use style sheets to control text flow around floatingimages and other objects.

With respect tobidirectionalformatting, theBR element should behave the same way the[ISO10646] LINE SEPARATOR character behaves in the bidirectionalalgorithm.

Prohibiting a line break 

Sometimes authors may want to prevent a line break from occurring betweentwo words. The &nbsp; entity (&#160; or &#xA0;) acts as a spacewhere user agents should not cause a line break.

9.3.3Hyphenation

In HTML, there are two types of hyphens: the plain hyphen and the softhyphen. The plain hyphen should be interpreted by a user agent as just anothercharacter. Thesoft hyphen tells the user agent where a linebreak can occur.

Those browsers that interpret soft hyphens must observe the followingsemantics: If a line is broken at a soft hyphen, a hyphen character must bedisplayed at the end of the first line. If a line is not broken at a softhyphen, the user agent must not display a hyphen character. For operations suchas searching and sorting, the soft hyphen should always be ignored.

In HTML, the plain hyphen is represented by the "-" character (&#45; or&#x2D;). The soft hyphen is represented by the character entity reference&shy; (&#173; or &#xAD;)

9.3.4Preformatted text:ThePRE element

<!ENTITY % pre.exclusion "IMG|OBJECT|BIG|SMALL|SUB|SUP"><!ELEMENTPRE - - (%inline;)* -(%pre.exclusion;) -- preformatted text --><!ATTLIST PRE%attrs;                              --%coreattrs,%i18n,%events --  >

Start tag:required, End tag:required

Attribute definitions

width =number[CN]
Deprecated. Thisattribute provides a hint to visual user agents about the desired width of theformatted block. The user agent can use this information to select anappropriate font size or to indent the content appropriately. The desired widthis expressed in number of characters. This attribute is not widely supportedcurrently.

Attributes defined elsewhere

ThePRE element tells visual user agents that the enclosed text is"preformatted". When handling preformatted text,visual user agents:

Non-visual user agents are not required to respect extrawhite space in the content of aPRE element.

For more information about SGML's specification of line breaks, pleaseconsult thenotes on linebreaks in the appendix.

The DTD fragment above indicates which elements may not appear within aPREdeclaration. This is the same as in HTML 3.2, and is intended to preserveconstant line spacing and column alignment for text rendered in a fixed pitchfont. Authors are discouraged from altering this behavior through stylesheets.

The following example shows a preformatted verse from Shelly's poemTo aSkylark:

<PRE>       Higher still and higher         From the earth thou springest       Like a cloud of fire;         The blue deep thou wingest,And singing still dost soar, and soaring ever singest.</PRE>

Here is how this is typically rendered:

       Higher still and higher         From the earth thou springest       Like a cloud of fire;         The blue deep thou wingest,And singing still dost soar, and soaring ever singest.

The horizontal tab character
The horizontal tab character (decimal 9 in[ISO10646] and[ISO88591] ) is usually interpreted by visual user agents as the smallestnon-zero number of spaces necessary to line characters up along tab stops thatare every 8 characters. We strongly discourage using horizontal tabs inpreformatted text since it is common practice, when editing, to set thetab-spacing to other values, leading to misaligned documents.

9.3.5Visual rendering ofparagraphs

Note. The following section is an informativedescription of the behavior of some current visual user agents when formattingparagraphs. Style sheets allow better control of paragraph formatting.

How paragraphs are rendered visually depends on the user agent. Paragraphsare usually rendered flush left with a ragged right margin. Other defaults areappropriate for right-to-left scripts.

HTML user agents have traditionally rendered paragraphs with white spacebefore and after, e.g.,

  At the same time, there began to take form a system of numbering,  the calendar, hieroglyphic writing, and a technically advanced  art, all of which later influenced other peoples.  Within the framework of this gradual evolution or cultural  progress the Preclassic horizon has been divided into Lower,  Middle and Upper periods, to which can be added a transitional  or Protoclassic period with several features that would later  distinguish the emerging civilizations of Mesoamerica.

This contrasts with the style used in novels which indents the first line ofthe paragraph and uses the regular line spacing between the final line of thecurrent paragraph and the first line of the next, e.g.,

     At the same time, there began to take form a system of  numbering, the calendar, hieroglyphic writing, and a technically  advanced art, all of which later influenced other peoples.     Within the framework of this gradual evolution or cultural  progress the Preclassic horizon has been divided into Lower,  Middle and Upper periods, to which can be added a transitional  or Protoclassic period with several features that would later  distinguish the emerging civilizations of Mesoamerica.

Following the precedent set by the NCSA Mosaic browser in 1993, user agentsgenerally don't justify both margins, in part because it's hard to do thiseffectively without sophisticated hyphenation routines. The advent of stylesheets, and anti-aliased fonts with subpixel positioning promises to offerricher choices to HTML authors than previously possible.

Style sheets provide rich control over the size and style of a font, themargins, space before and after a paragraph, the first line indent,justification and many other details. The user agent's default style sheetrendersP elements in a familiar form, as illustrated above. One could, inprinciple, override this to render paragraphs without the breaks thatconventionally distinguish successive paragraphs. In general, since this mayconfuse readers, we discourage this practice.

By convention, visual HTML user agentswrap textlines to fit within the available margins. Wrapping algorithmsdepend on the script being formatted.

In Western scripts, for example, text should only be wrapped at white space.Early user agents incorrectly wrapped lines just after the start tag or justbefore the end tag of an element, which resulted in dangling punctuation. Forexample, consider this sentence:

   A statue of the <A href="cih78">Cihuateteus</A>, who are patron ...

Wrapping the line just before the end tag of theA element causes the comma to bestranded at the beginning of the next line:

  A statue of the Cihuateteus  , who are patron ...

This is an error since there was no white space at that point in themarkup.

9.4 Marking document changes: TheINS andDEL elements

<!-- INS/DEL are handled by inclusion on BODY --><!ELEMENT (INS|DEL) - - (%flow;)*      -- inserted text, deleted text --><!ATTLIST (INS|DEL)%attrs;                              --%coreattrs,%i18n,%events --cite%URI;          #IMPLIED  -- info on reason for change --datetime%Datetime;     #IMPLIED  -- date and time of change --  >

Start tag:required, End tag:required

Attribute definitions

cite =uri[CT]
The value of this attribute is a URI that designates a source document ormessage. This attribute is intended to point to information explaining why adocument was changed.
datetime =datetime[CS]
The value of this attribute specifies thedate andtime when the change was made.

Attributes defined elsewhere

INS andDEL are used to markup sections of the document that havebeeninserted or deleted with respect to a differentversion of a document (e.g., in draft legislation where lawmakers need to viewthe changes).

These two elements are unusual for HTML in that they may serve as eitherblock-level or inline elements (but not both). They may contain one or morewords within a paragraph or contain one or more block-level elements such asparagraphs, lists and tables.

This example could be from a bill to change the legislation for how manydeputies a County Sheriff can employ from 3 to 5.

<P>  A Sheriff can employ <DEL>3</DEL><INS>5</INS> deputies.</P>

TheINS andDEL elements must not contain block-level content when theseelements behave as inline elements.

ILLEGAL EXAMPLE:
The following is not legal HTML.

<P><INS><DIV>...block-level content...</DIV></INS></P>

User agents should render inserted and deleted text in ways that make thechange obvious. For instance, inserted text may appear in a special font,deleted text may not be shown at all or be shown as struck-through or withspecial markings, etc.

Both of the following examples correspond to November 5, 1994, 8:15:30 am,US Eastern Standard Time.

     1994-11-05T13:15:30Z     1994-11-05T08:15:30-05:00

Used withINS, this gives:

<INS datetime="1994-11-05T08:15:30-05:00"        cite="http://www.foo.org/mydoc/comments.html">Furthermore, the latest figures from the marketing departmentsuggest that such practice is on the rise.</INS>

The document "http://www.foo.org/mydoc/comments.html" would contain commentsabout why information was inserted into the document.

Authors may also make comments about inserted or deleted text by means ofthetitle attribute for theINS andDEL elements. User agents may presentthis information to the user (e.g., as a popup note). For example:

<INS datetime="1994-11-05T08:15:30-05:00"        title="Changed as a result of Steve B's comments in meeting.">Furthermore, the latest figures from the marketing departmentsuggest that such practice is on the rise.</INS>

previous  next contents  elements  attributes  index

[8]ページ先頭

©2009-2025 Movatter.jp