See alsotranslations.
Copyright © 2024World Wide Web Consortium.W3C®liability,trademark andpermissive document license rules apply.
This specification specifies a Hypothetical Render Model (HRM) that constrains the presentation complexity of documents that conform to the Text Profiles specified in any edition of Internet Media Subtitles and Captions ([IMSC]).
The objective of the HRM is to allow subtitle and caption authors and providers to verify that the content they provide does not exceed defined complexity levels, so that playback systems can render the content synchronized with the author-specified display times.
The model is not intended as a specification of the processing requirements for implementations. For instance, while the model defines glyph cache for the purpose of modelling how the number of glyph drawing operations can be reduced, it neither requires the implementation of such a cache, nor models the sub-pixel glyph positioning and anti-aliased glyph rendering that can be used to produce text output.
Furthermore, the model is not intended to constrain readability complexity.
This section describes the status of this document at the time of its publication. A list of currentW3C publications and the latest revision of this technical report can be found in theW3C technical reports index at https://www.w3.org/TR/.
This document was published by theTimed Text Working Group as a Recommendation using theRecommendation track.
The history of substantive changes made to this document is summarized atF.Summary of substantive changes.
W3C recommends the wide deployment of this specification as a standard for the Web.
AW3C Recommendation is a specification that, after extensive consensus-building, is endorsed byW3C and its Members, and has commitments from Working Group members toroyalty-free licensing for implementations. Future updates to this Recommendation may incorporatenew features.
This document was produced by a group operating under theW3C Patent Policy.W3C maintains apublic list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes containsEssential Claim(s) must disclose the information in accordance withsection 6 of theW3C Patent Policy.
This document is governed by the03 November 2023W3C Process Document.
This specification specifies a Hypothetical Render Model (HRM) that constrains the presentation complexity of aIMSC Document Instance.
This specification uses the same conventions as [IMSC].
character. The character code property of a [TTML2]Character Information Item.
The termcharacter is for practical purposes the same as acode point, as defined by [i18n-glossary].
empty ISD. AnIntermediate Synchronic Document with nopresented region.
non-empty ISD. AnIntermediate Synchronic Document with at least onepresented region.
error. A failure to conform to the constraints defined by this specification.
grapheme. As defined by [i18n-glossary] atgrapheme.
Intermediate Synchronic Document. As defined by [TTML2] atIntermediate Synchronic Document.
IMSC Document Instance. A [TTML2]Document Instance that conforms to the Text Profile defined in any edition of [IMSC].
presentation processor. As defined by [TTML2] atpresentation processor.
presented region. As defined by [IMSC] atpresented region.
Related Video Object. As defined by [IMSC] atRelated Video Object.
Root Container Region. As defined by [TTML2] atRoot Container Region.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key wordSHALL in this document is to be interpreted as described inBCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
Unless noted otherwise, this specification applies to anIMSC Document Instance.
AIMSC Document Instanceconforms to the Hypothetical Render Model if the sequence ofIntermediate Synchronic Documents generated from it using theIntermediate Synchronic Document Construction procedure specified in [TTML2] is processed withouterror by the HRM algorithm specified at7.Algorithm.
Applying the Hypothetical Render Model to aDocument Instance that is not anIMSC Document Instance yields results that might not reflect the complexity of theDocument Instance.
In applications where sequences ofDocument Instances can be resolved into a single sequence ofIntermediate Synchronic Documents that do not overlap each other temporally, conformance can be determined based on a synthesisedDocument Instance that generates an equivalent sequence ofIntermediate Synchronic Documents, where minimal equivalence is limited to the content and metrics that are used to identifyerrors.
This section is non-normative.
The objective of the HRM is to allow subtitle and caption authors and providers to verify that the content they provide does not exceed defined complexity levels, so that playback systems can render the content synchronized with the author-specified display times.
Playback systems include desktop computers, mobile devices and home theatre devices.
The HRM is not a new concept: a version of it has been included in all versions and editions of [IMSC]. This specification extracts the HRM into a standalone document to simplify maintenance. The First Public Working Draft of this specification essentially included the HRM as it was specified in [ttml-imsc1.2]. Substantive changes made since then are summarized inF.Summary of substantive changes.
IMSC Document Instances are typically authored by a first party and rendered by a second party. Unless both parties agree on the maximum complexity of aIMSC Document Instance, it is likely that:
As illustrated inFigure1, by defining a method (the HRM) to compute a proxy for the complexity of anIMSC Document Instance and specifying a complexity limit based on such proxy:
The HRM supplements the syntactic and structural constraints imposed in [IMSC] by imposing constraints on the contents of the presentation.
Because of the temporal and spatial variability of subtitles and captions across types of content, territories and languages, it is not possible to limit the complexity of anIMSC Document Instance using only average values.
An average-based constraint of 840characters per minute could be met in multiple ways, with different rendering complexities. Contrast two potential approaches:
In the first, 5characters are presented for a fraction of a second, followed by 835characters that are then presented for over 59 seconds. This generates a high rendering complexity for the 835characters, since there is only a brief time available to paint them.
In the second, 210characters are painted every 15 seconds, giving 15 seconds to prepare for the next presentation. This has a much lower rendering complexity.
The HRM achieves a more accurate representation of the complexity of anIMSC Document Instance at any given time by taking into account its past complexity in addition to its instantaneous complexity. The same approach is commonly used in video to limit bitstream complexity, e.g., the Hypothetical Reference Decoder (HRD) specified in [iso14496-10].
The HRM defines a simple model for the rendering of subtitles and captions, and uses the time it takes to render subtitles and captions according to that model as a proxy for the complexity of the subtitles and captions. Rendering includes drawing region backgrounds, rendering text and copying text. Complexity is then limited by requiring that the time to render one subtitle or caption is shorter than the time elapsed since the previous subtitle or caption.
This simple model requires only a static analysis of theIMSC Document Instance, requires no fetching of external resources and does not require theIMSC Document Instance to be actually rendered. Several simplifying assumptions are made to achieve this. For example, the model assumes that eachcharacter is drawn independently, and accounts for that assumption being, in many cases, false, by assigning different render speeds for different scripts. In general the model is not intended to capture the actual time that an implementation takes to render subtitles and captions, but rather scale with it: a document that is twice as complex according to the model would require roughly twice as many resources to actually render.
The HRM is typically used prior to distribution of theIMSC Document Instance to the end-user, as an integral part of authoring and as a quality check before distribution.
When the HRM is used, the consequences of anIMSC Document Instance exceeding the HRM limits depends on the context:
The HRM is not intended to be used when theIMSC Document Instance is presented to end-users since:
This section is non-normative.
The HRM, illustrated inFigure2, operates on a sequence ofIntermediate Synchronic Documents Ei:
The model specifies a (hypothetical) time required for completely painting anon-empty ISD as a proxy for complexity. Painting includes clearing the Back Buffer, drawing region backgrounds, rendering glyphs, and copying glyphs. Complexity is then limited by requiring that painting ofnon-empty ISD En begins no earlier than the presentation time of the previous non-emptynon-empty ISD Em and completes by the presentation time of En.
In contrast, there is no complexity involved connecting and disconnecting the Front Buffer from the display, and thus no complexity associated withempty ISDs.
Whenever applicable, constraints are specified relative toRoot Container Region dimensions, allowing subtitle sequences to be authored independently of theRelated Video Object resolution.
To enable scenarios where the same glyphs are used in multiple successiveIntermediate Synchronic Documents, e.g. to convey a CEA-608/708-style roll-up (see [CEA-608] and [CEA-708]), a Glyph Cache stores rendered glyphs acrossIntermediate Synchronic Documents, allowing glyphs to be copied into the Presentation Buffer instead of rendered, a more costly operation.
The HRM permits a maximum rate of 12Intermediate Synchronic Documents per second. This is ultimately limited by theBDraw parameter and is intended to capture processing and presentation overhead. When converting a [CEA-608] signal to IMSC, it is therefore impossible to createIMSC Document Instances that generate anIntermediate Synchronic Document for every [CEA-608] packet, which are sampled at the video field rate. It is instead preferable to coalesce sequences of [CEA-608] packets into longer groupings, such as words, phrases, complete lines or paragraphs before creating anIMSC Document Instance, and let thepresentation processor perform any desired animation, e.g., typewriter effect.
Each of the termsPresentation Compositor, Glyph Renderer and Glyph Copier is defined by the algorithmic requirements defined for it in this specification.
The HRM algorithm processes a sequence ofIntermediate Synchronic Documents Ei.
Each successivenon-empty ISD En is rendered by thePresentation Compositor using the following steps in order:
ThePresentation Compositor begins rendering En:
ThePresentation Compositor never begins rendering an ISD more thanIPD ahead of its presentation time.
The duration DUR(En) for painting anIntermediate Synchronic Document En in the Back Buffer is given by:
DUR(En) =S(En) /BDraw +DURT(En)
where
The contents of the Back Buffer are transferred instantaneously to the Front Buffer at the presentation time of anon-empty ISD En, making the latter available for display.
The Front Buffer is:
It is possible for the contents of the Front Buffer to never be displayed. This can happen, for example, if the Back Buffer is copied twice to Front Buffer between two consecutive video frame boundaries of theRelated Video Object.
ItSHALL be anerror for thePresentation Compositor to fail to complete painting pixels fornon-empty ISD En before its presentation time.
The following table specifies the values ofIPD andBDraw.
Parameter | Initial value |
---|---|
Initial Painting Delay (IPD) | 1 s |
Normalized background drawing performance factor (BDraw) | 12 s-1 |
BDraw effectively sets a limit on fillings regions - for example, assuming that theRoot Container Region is ultimately rendered at 1920×1080 resolution, aBDraw of 12 s-1 would correspond to a fill rate of 1920×1080×12/s=23.7×220pixels s-1.
IPD effectively sets a limit on the complexity of any givenIntermediate Synchronic Document.
The total normalized drawing areaS(En) forIntermediate Synchronic Document En is given by:
whereCLEAR(En) = 1.
To ensure consistency of the Back Buffer, a newIntermediate Synchronic Document requires clearing of theRoot Container Region.
PAINT(En) is the normalized area to be painted for all regions that are used inIntermediate Synchronic Document En according to:
PAINT(En) = ∑Ri∈RpNSIZE(Ri) ∙NBG(Ri)
where Rp is the set ofpresented regions in theIntermediate Synchronic Document En.
NSIZE(Ri) is given by:
NSIZE(Ri) = (width of Ri ∙ height of Ri ) ÷ (Root Container Region height ∙Root Container Region width)
For a region Ri in withtts:extent="250px 50px"
within aRoot Container Region withtts:extent="1920px 1080px"
,NSIZE(Ri) ≈ 0.00603.NBG(Ri) is the total number of elements within the tree rooted at region Ri that satisfy the following criteria:
region
,body
,div
,p
orspan
; andtts:backgroundColor
is not0
.An element and its parent that satisfy the criteria above and share identical computed values oftts:backgroundColor
are counted as two distinct elements for the purpose of computingNBG(Ri).
Theset
element is not included in the computation ofNBG(Ri). While it can affect the computed values oftts:backgroundColor
, it is removed duringIntermediate Synchronic Document construction.
In the context of this section, aglyph is a tuple consisting of (i) onecharacter and (ii) the computed values of the following style properties:
tts:color
tts:fontFamily
tts:fontSize
tts:fontStyle
tts:fontWeight
tts:textDecoration
tts:textOutline
tts:textShadow
In the case where a property isprohibited in a profile of [IMSC], the computed value of the property specified in [ttml2] can be used.
The Hypothetical Render Model defines a one-to-one mapping betweencharacters andglyphs (using the definition of glyph from this document). While a one-to-one mapping betweencode points and glyphs (using the definition of glyph from [i18n-glossary]) is common in some scripts (such as the Latin script), the actual relationship is more complex. Some scripts, such as Arabic, use different glyphs for a given character, depending on its position in a word. Some scripts require combining marks or use a sequence of code points to form a glyph. Cases exist where a given sequence of code points can have different glyph representations depending on context. This complexity is accounted for by reducing the performance of the Glyph Cache for scripts where a one-to-one mapping is not the general rule (seeGCpy below).
Iterating through eachcharacter in the character content of eachpresented region ofIntermediate Synchronic Document En, for theglyph associated with thatcharacter, thePresentation Compositor:
The durationDURT(En) for rendering the text of anIntermediate Synchronic Document En in the Back Buffer is as follows:
DURT(En) = ∑gi ∈ ΓrNRGA(gi) /Ren(gi) + ∑gj ∈ ΓcNRGA(gj) /GCpy
where
The Rendered Glyph AreaNRGA(gi) of aglyph gi is given by:
NRGA(gi) = (fontSize of gi as a decimal fraction ofRoot Container Region height)2
NRGA(gi) does not take into account decorations (e.g. underline), effects (e.g. outline) or actual typographical glyph aspect ratio. An implementation can determine an actual cache size needs based on worst-case glyph size complexity.
At the presentation time ofIntermediate Synchronic Document En, perform the following steps in order:
ItSHALL be anerror if the sum ofNRGA(gi) over allglyphs flaggedretain in the Glyph Cache is at any time larger than the Normalized Glyph Cache Size (NGBS).
The abbreviation NGBS reflects the name of the Glyph Cache from earlier editions of the specification.
Unless specified otherwise, the following table specifies values ofGCpy,Ren andNGBS.
Normalized glyph copy performance factor (GCpy) | |
---|---|
Script property, as defined at [UAX24], for thecharacter of gi | GCpy |
Latin ,Greek ,Cyrillic ,Hebrew orCommon | 12 |
any other value | 3 |
Text rendering performance factor Ren(Gi) | |
Script property, as defined at [UAX24], for thecharacter of gi | Ren(Gi) |
Han ,Katakana ,Hiragana ,Bopomofo orHangul | 0.6 |
any other value | 1.2 |
Normalized Glyph Cache Size (NGBS) | |
1 |
WhileDURT(En) is not affected, the choice of font by thepresentation processor can increase actual rendering complexity at time of presentation. For instance, a cursive font might select different glyphs for a givengrapheme (in order to maintain joining or for the start/end of the word) even in theLatin
script. Conversely the rendering of scripts that fall in theany other value category can in practice achieve performance comparable to, say, theLatin
script.
tts:fontSize
of1c
are used, the font size relative to theRoot Container Region height is 1/15 , and the maximum number of distinct glyphs that can be cached is 1÷(1÷15)2=225 glyphs.GCpy effectively sets a limit on animating text. For example, assuming that theRoot Container Region is ultimately rendered at 1920×1080 resolution and no regions need to have background color painted (so only aCLEAR(En) operation is required for the normalized drawing area for theIntermediate Synchronic Document), aGCpy andBDraw of 12 s-1 would mean that a group of 160glyphs with atts:fontSize
equal to 5% of theRoot Container Region height could be moved at most approximately 12 s-1 ÷ (1 + ( 160 × 0.052 )) = 8.6 times per second.Ren(Gi) effectively sets a limit on the text rendering rate. For example, assuming that theRoot Container Region is ultimately rendered at a 1920×1080 resolution, aRen(Gi) of 1.2 s-1 would mean that at most 120glyphs with a fontSize of 108 px (10% of 1080 px andNRGA(gi) = 0.01) could be rendered every second.This section is non-normative.
In a system whereIMSC Document Instances are expected to conform to the Hypothetical Render Model, anIMSC Document Instance that does not conform to the Hypothetical Render Model might negatively impact accessibility during presentation of theIMSC Document Instance and its associated content.
This specification does not attempt to model any additional complexity forpresentation processors that might arise due to the user customisation of presentation, for example as described by [media-accessibility-reqs]; such user customisation is not defined by [IMSC].
Implementers ofpresentation processors that support user customisation of presentation should ensure that those processors are able to presentIMSC Document Instances that conform to the Hypothetical Render Model, even if the customisation effectively increases the complexity of presentation.
This section is non-normative.
This specification has no inherent security or privacy implications.
The algorithm defined within this specification is used for static analysis of a resource. This specification does not define any protocol or interface for obtaining such a resource, and it does not define any interface for exposing the results of the analysis. No personal or sensitive information is processed as part of the algorithm, other than any such information that might happen to be part of theIMSC Document Instance being analysed. No information is exposed by the algorithm to any origin. No scripts are loaded or processed as part of the algorithm and no links to external resources are dereferenced.
Implementers of this specification should capture and meet privacy and security requirements for their intended application. For example, an implementation could, when reporting on anerror encountered during processing of anIMSC Document Instance, include a section of the content of anIMSC Document Instance to elaborate the error. If that content could include sensitive or personal information, the implementation should ensure that any such output is provided using appropriately secure protocols. No such reporting is defined or required by this specification.
This section is non-normative.
This specification does not define how, or even if,errors should be reported.
For example, an implementation could stop on the first error encountered, or continue to process theIMSC Document Instance and report every error. Or an implementation could exit with an appropriate status code without reporting any details at all.
This specification does not define any runtime exceptions, or how such exceptions should be handled.
This section is non-normative.
The editor acknowledges the current and former members of the Timed Text Working Group, the members of otherW3C Working Groups, and industry experts in other forums who have contributed directly or indirectly to the process or content of this document.
The editor wishes to especially acknowledge the following contributions by members: Nigel Megitt (British Broadcasting Corporation) and Atsushi Shimono (W3C).
The editor also wishes to acknowledge Cyril Concolato (Netflix), Michael Dolan (Invited Expert) and Paul Londino (Warner Bros. Discovery) for contributing content producing implementations to the implementation report.
This section is non-normative.
In order to allow short (less than 100 ms) gaps between subtitles, which is common practice, the complexity of presentingempty ISDs has been reduced to zero: instead of being drawn into the Back Buffer, anempty ISD merely disconnects the Front Buffer from the display while it is presented.
Details at:https://github.com/w3c/imsc-hrm/issues/49
The firstIntermediate Synchronic Document is no longer treated differently and incurs a cost for clearing the Back Buffer.
Details at:https://github.com/w3c/imsc-hrm/issues/49
Details at:https://github.com/w3c/imsc-hrm/issues/38
Details at:https://github.com/w3c/imsc-hrm/issues/39
Support for IMSC Image Profile, which was an at-risk feature, was removed due to insufficient demonstrable implementation experience.
Details at:https://github.com/w3c/imsc-hrm/issues/63
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in: