Movatterモバイル変換


[0]ホーム

URL:


W3C

Dubbing and Audio description Profiles of TTML2

W3C Candidate Recommendation Draft

More details about this document
This version:
https://www.w3.org/TR/2025/CRD-dapt-20250321/
Latest published version:
https://www.w3.org/TR/dapt/
Latest editor's draft:
https://w3c.github.io/dapt/
History:
https://www.w3.org/standards/history/dapt/
Commit history
Implementation report:
https://www.w3.org/wiki/TimedText/DAPT_Implementation_Report
Editors:
Cyril Concolato (Netflix)
Nigel Megitt (British Broadcasting Corporation)
Feedback:
GitHub w3c/dapt (pull requests,new issue,open issues)
public-tt@w3.org with subject line[dapt]… message topic … (archives)

Copyright © 2025World Wide Web Consortium.W3C®liability,trademark anddocument use rules apply.


Abstract

This specification definesDAPT, aTTML-based file format for the exchange of timed text content in dubbing and audio description workflows.

Status of This Document

This section describes the status of this document at the time of its publication. A list of currentW3C publications and the latest revision of this technical report can be found in theW3C standards and drafts index at https://www.w3.org/TR/.

This document incorporates aregistry section and definesregistry tables, as defined in the [w3c-process] requirements forw3c registries. Updates to the document that only changeregistry tables can be made without meeting other requirements for Recommendation track updates, as set out inUpdating Registry Tables; requirements for updating those registry tables are normatively specified withinH.Registry Section.

Please see the Working Group'simplementation report.

For this specification to exit the CR stage, at least 2 independent implementations of every feature defined in this specification but not already present in [TTML2] need to be documented in the implementation report. The Working Group does not require that implementations are publicly available but encourages them to be so.

A list of the substantive changes applied since theinitial Working Draft is found atsubstantive-changes-summary.txt.

The Working Group has identified the followingat risk features:

Issue 218: At-risk: support for `src` attribute in `<audio>` for external resourcePR-must-have

Possible resolution to#113.

Issue 219: At-risk: support for `<source>` element child of `<audio>` for external resourcePR-must-have

Possible resolution to#113.

Issue 220: At-risk: support for `src` attribute of `<audio>` element pointing to embedded resourcePR-must-have

Possible resolution to#114 and#115.

The link to#115 is that this implies the existence of some referenceable embedded audio resource too, which one of the options described in#115.

Issue 221: At-risk: support for `<source>` child of `<audio>` element pointing to embedded resourcePR-must-have

Possible resolution to#114 and#115.

The link to#115 is that this implies the existence of some referenceable embedded audio resource too, which one of the options described in#115.

Issue 222: At-risk: support for inline audio resourcesPR-must-have

Possible resolution to#115.

Issue 223: At-risk: each of the potential values of `encoding` in `<data>`PR-must-have

Possible resolution to#117.

Issue 224: At-risk: support for the `length` attribute on `<data>`PR-must-have

Possible resolution to#117.

Issue 239: At-risk: Script Event Grouping and Script Event MappingPR-must-have

Support for the#scriptEventGrouping and#scriptEventMapping features, together, is at risk pending implementer feedback.

At risk features may be be removed before advancement to Proposed Recommendation.

This document was published by theTimed Text Working Group as a Candidate Recommendation Draft using theRecommendation track.

Publication as a Candidate Recommendation does not imply endorsement byW3C and its Members. A Candidate Recommendation Draft integrates changes from the previous Candidate Recommendation that the Working Group intends to include in a subsequent Candidate Recommendation Snapshot.

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress. Future updates to this specification may incorporatenew features.

This document was produced by a group operating under theW3C Patent Policy.W3C maintains apublic list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes containsEssential Claim(s) must disclose the information in accordance withsection 6 of theW3C Patent Policy.

This document is governed by the03 November 2023W3C Process Document.

1.Scope

This specification defines a text-based profile of the Timed Text Markup Language version 2.0 [TTML2] intended to supportdubbing andaudio description workflows worldwide, to meet the requirements defined in [DAPT-REQS], and to permit usage of visual presentation features within [TTML2] and its profiles, for example those in [TTML-IMSC1.2].

2.Introduction

This section is non-normative.

2.1Transcripts and Scripts

In general usage, one meaning of the wordscript is the written text of a film, television programme, play etc. A script can be either a record of the completed production, also known as a transcript, or as a plan for a yet to be created production. In this document, we use domain-specific terms, and define more specifically that:

The termDAPT script is used generically to refer to bothtranscripts andscripts, and is a point of conformance to the formal requirements of this specification.DAPT Scripts consist of timed text and associated metadata, such as the character speaking.

In dubbing workflows, atranscript is generated and translated to create ascript. In audio description workflows, atranscript describes the video image, and is then used directly as ascript for recording an audio equivalent.

DAPT is aTTML-based format for the exchange oftranscripts andscripts (i.e.DAPT Scripts) among authoring, prompting and playback tools in the localization andaudio description pipelines. ADAPT document is a serializable form of aDAPT Script designed to carry pertinent information for dubbing oraudio description such as type ofDAPT script, dialogue, descriptions, timing, metadata, original language transcribed text, translated text, language information, and audio mixing instructions, and to be extensible to allow user-defined annotations or additional future features.

This specification defines the data model forDAPT scripts and its representation as a [TTML2] document (see4.DAPT Data Model and correspondingTTML syntax) with some constraints and restrictions (see5.Constraints).

ADAPT script is expected to be used to make audio visual media accessible or localized for users who cannot understand it in its original form, and to be used as part of the solution for meeting user needs involvingtranscripts, including accessibility needs described in [media-accessibility-reqs], as well as supporting users who need dialogue translated into a different language via dubbing.

Every part of theDAPT script content is required to be marked up with some indication of what it represents in the related media, via theRepresents property; likewise theDAPT Script as a whole is required to list all the types of content that it represents, for example if it represents audio content or visual content, and if visual, then if it represents text or non-text etc. A registry of hierarchical content descriptors is provided.

The authoring workflow for both dubbing and audio description involves similar stages, that share common requirements as described in [DAPT-REQS]. In both cases, the author reviews the content and writes down what is happening, either in the dialogue or in the video image, alongside the time when it happens. Further transformation processes can change the text to a different language and adjust the wording to fit precise timing constraints. Then there is a stage in which an audio rendering of thescript is generated, for eventual mixing into the programme audio. That mixing can occur prior to distribution, or in the player directly.

2.1.1Dubbingscripts

Thedubbing process which consists in creating adubbing script is a complex, multi-step process involving:

  • Transcribing and timing the dialogue in its own language from a completed programme to create atranscript;
  • Notating dialogue with character information and other annotations;
  • Generating localization notes to guide further adaptation;
  • Translating the dialogue to a target languagescript;
  • Adapting the translation to the dubbing; for example matching the actor’s lip movements in the case of dubs.

Adubbing script is atranscript orscript (depending on workflow stage) used for recording translated dialogue to be mixed with the non-dialogue programme audio, to generate a localized version of the programme in a different language, known as a dubbed version, or dub for short.

Dubbing scripts can be useful as a starting point for creation of subtitles or closed captions in alternate languages. This specification is designed to facilitate the addition of, and conversion to, subtitle and caption documents in other profiles ofTTML, such as [TTML-IMSC1.2], for example by permitting subtitle styling syntax to be carried inDAPT documents. Alternatively, styling can be applied to assist voice artists when recording scripted dialogue.

2.1.2Audio Descriptionscripts

Creating audio description content is also a multi-stage process. Anaudio description, also known as video description or in [media-accessibility-reqs] asdescribed video, is an audio service to assist viewers who can not fully see a visual presentation to understand the content. It is the result of mixing themain programme audio with the audio rendition of eachdescription, authored to be timed when it does not clash with dialogue, to deliver an audio description mixed audio track.Main programme audio refers to the audio associated with the programme prior to any further mixing. Adescription is a set of words that describes an aspect of the programme presentation, suitable for rendering into audio by means of vocalisation and recording or used as a text alternative source for text to speech translation, as defined in [WCAG22]. More information about whataudio description is and how it works can be found at [BBC-WHP051].

Writing the audio descriptionscript typically involves:

  • watching the video content of the programme, or series of programmes,
  • identifying the key moments during which there is an opportunity to speak descriptions,
  • writing the description text to explain the important visible parts of the programme at that time,
  • creating an audio version of the descriptions, either by recording a human actor or using text to speech,
  • defining mixing instructions (applied using [TTML2] audio styling) for combining the audio with the programme audio.

The audio mixing can occur prior to distribution of the media, or in the client. If theaudio descriptionscript is delivered to the player, the text can be used to provide an alternative rendering, for example on a Braille display, or using the user's configured screen reader.

2.1.3Other uses

DAPT Scripts can be useful in other workflows and scenarios. For example,Original language transcripts could be used as:

  • the output format of a speech to text system, even if not intended for translation, or for the production of subtitles or captions;
  • a document known in the broadcasting industry as a "post production script", used primarily for preview, editorial review and sales purposes;

BothOriginal language transcripts andTranslated transcripts could be used as:

  • an accessible transcript presented alongside audio or video in a web page or application; in this usage, the timings could be retained and used for synchronisation with, or navigation within, the media or discarded to present a plain text version of the entire timeline.

2.2Example documents

2.2.1Basic document structure

The top level structure of a document is as follows:

  • The<tt> root element in the namespacehttp://www.w3.org/ns/ttml indicates that this is aTTML document and thettp:contentProfiles attribute indicates that it adheres to theDAPTcontent profile defined in this specification.
  • Thedaptm:scriptRepresents attribute indicates what the contents of the document are an alternative for, within the original programme.
  • Thedaptm:scriptType attribute indicates the type oftranscript orscript but in this empty example, it is not relevant, since only the structure of the document is shown.
  • Thedaptm:langSrc attribute indicates the default text language source, for example the original language of the content, while thexml:lang attribute indicates the default language in this script, which in this case is the same. Both of these attributes are inherited and can be overridden within the content of the document.

The structure is applicable to all types ofDAPT scripts, dubbing oraudio description.

Example 1
<ttxmlns="http://www.w3.org/ns/ttml"xmlns:ttp="http://www.w3.org/ns/ttml#parameter"xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"xml:lang="en"daptm:langSrc="en"daptm:scriptRepresents="audio"daptm:scriptType="originalTranscript"><head><metadata><!-- Additional metadata may be placed here --><!-- Any characters must be defined here as a set of ttm:agent elements --></metadata><styling><!-- Styling is optional and consists of a set of style elements --></styling><layout><!-- Layout is optional and consists of a set of region elements --></layout></head><body><!-- Content goes here and consists of a div for each Script Event --><divxml:id="d1"begin="..."end="..."daptm:represents="audio.dialogue"><p><!-- Text blocks are contained in p elements --></p><pxml:lang="fr"daptm:langSrc="en"><!-- Translation text is related to the source language for the translation --></p></div></body></tt>

The following examples correspond to the timed texttranscripts andscripts produced at each stage of the workflow described in [DAPT-REQS].

The first example shows an early stagetranscript in which timed opportunities for descriptions or transcriptions have been identified but no text has been written; thedaptm:represents attribute present on the<body> element here is inherited by the<div> elements since they do not specify a different value:

Example 2
...<bodydaptm:represents="..."><divxml:id="id1"begin="10s"end="13s"></div><divxml:id="id2"begin="18s"end="20s"></div></body>...

The following examples will demonstrate different uses in dubbing and audio description workflows.

2.2.2Audio Description Examples

When descriptions are added this becomes aPre-Recording Script. Note that in this case, to reflect that most of the audio description content transcribes the video image where there is no inherent language, theText Language Source, represented by thedaptm:langSrc attribute, is set to the empty string at the top level of the document. It would be semantically equivalent to omit the attribute altogether, since the default value is the empty string:

Example 3
<ttxmlns="http://www.w3.org/ns/ttml"xmlns:ttp="http://www.w3.org/ns/ttml#parameter"xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"xmlns:xml="http://www.w3.org/XML/1998/namespace"ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"xml:lang="en"daptm:langSrc=""daptm:scriptRepresents="visual.nonText"daptm:scriptType="preRecording"><body><divbegin="10s"end="13s"xml:id="a1"daptm:represents="visual.nonText"><p>        A woman climbs into a small sailing boat.</p></div><divbegin="18s"end="20s"xml:id="a2"daptm:represents="visual.nonText"><p>        The woman pulls the tiller and the boat turns.</p></div></body></tt>

Audio description content often includes text present in the visual image, for example if the image contains a written sign, a location, etc. The following example demonstrates such a case:Script Represents is extended to show that thescript's contents represent textual visual information in addition to non-textual visual information. Here a more precise value ofRepresents is specified on theScript Event to reflect that the text is in fact a location, which is allowed because the more precise value is asub-type of the new value inScript Represents. Finally, since the text has an inherent language, theText Language Source is set to reflect that language.

Example 4
<ttxmlns="http://www.w3.org/ns/ttml"xmlns:ttp="http://www.w3.org/ns/ttml#parameter"xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"xmlns:xml="http://www.w3.org/XML/1998/namespace"ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"xml:lang="en"daptm:langSrc=""daptm:scriptRepresents="visual.nonText visual.text"daptm:scriptType="preRecording"><body><divbegin="7s"end="8.5s"xml:id="at1"daptm:represents="visual.text.location"daptm:langSrc="en"><p>        The Lake District, England</p></div><divbegin="10s"end="13s"xml:id="a1"daptm:represents="visual.nonText"><p>        A woman climbs into a small sailing boat.</p></div><divbegin="18s"end="20s"xml:id="a2"daptm:represents="visual.nonText"><p>        The woman pulls the tiller and the boat turns.</p></div></body></tt>

After creating audio recordings, if not using text to speech, instructions for playback mixing can be inserted. For example, The gain of "received" audio can be changed before mixing in the audio played from inside the<span> element, smoothly animating the value on the way in and returning it on the way out:

Example 5
<tt...daptm:scriptRepresents="visual.nonText"daptm:scriptType="asRecorded"xml:lang="en"daptm:langSrc="">  ...<divbegin="25s"end="28s"xml:id="a3"daptm:represents="visual.nonText"><p><animatebegin="0.0s"end="0.3s"tta:gain="1;0.39"fill="freeze"/><animatebegin="2.7s"end="3s"tta:gain="0.39;1"/><spanbegin="0.3s"end="2.7s"><audiosrc="clip3.wav"/>          The sails billow in the wind.</span></p></div>...

At the document level, thedaptm:scriptRepresents attribute indicates that the document represents both visual text and visual non-text content in the related media. It is possible that there are no Script Events that actually represent visual text, for example because there is no text in the video image.

In the above example, the<div> element'sbegin attribute defines the time that is the "syncbase" for its child, so the times on the<animate> and<span> elements are relative to 25s here. The first<animate> element drops the gain from 1 to 0.39 over 0.3s, freezing that value after it ends, and the second one raises it back in the final 0.3s of this description. Then the<span> element is timed to begin only after the first audio dip has finished.

If the audio recording is long and just a snippet needs to be played, that can be done usingclipBegin andclipEnd. If we just want to play the part of the audio from file from 5s to 8s it would look like:

Example 6
...  <span><audio src="long_audio.wav" clipBegin="5s" clipEnd="8s"/>  A woman climbs into a small sailing boat.</span>...

Or audio attributes can be added to trigger the text to be spoken:

Example 7
...<divbegin="18s"end="20s"xml:id="a2"><p><spantta:speak="normal">          The woman pulls the tiller and the boat turns.</span></p></div>...

It is also possible to embed the audio directly, so that a single document contains thescript and recorded audio together:

Example 8
...<divbegin="25s"end="28s"xml:id="a3"><p><animatebegin="0.0s"end="0.3s"tta:gain="1;0.39"fill="freeze"/><animatebegin="2.7s"end="3s"tta:gain="0.39;1"/><spanbegin="0.3s"end="2.7s"><audio><source><datatype="audio/wave">            [base64-encoded audio data]</data></source></audio>          The sails billow in the wind.</span></p></div>...

2.2.3Dubbing Examples

From the basic structure ofExample 1, transcribing the audio produces an original language dubbingtranscript, which can look as follows. No specific style or layout is defined, and here the focus is on the transcription of the dialogue. Characters are identified within the<metadata> element. Note that the language and thetext language source are defined usingxml:lang anddaptm:langSrc attributes respectively, which have the same value because the transcript is not translated.

Example 9
<ttxmlns="http://www.w3.org/ns/ttml"xmlns:ttm="http://www.w3.org/ns/ttml#metadata"xmlns:ttp="http://www.w3.org/ns/ttml#parameter"xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"xml:lang="fr"daptm:langSrc="fr"daptm:scriptRepresents="audio.dialogue"daptm:scriptType="originalTranscript"><head><metadata><ttm:agenttype="character"xml:id="character_1"><ttm:nametype="alias">ASSANE</ttm:name></ttm:agent></metadata></head><body><divbegin="10s"end="13s"xml:id="d1"daptm:represents="audio.dialogue"><pttm:agent="character_1"><span>Et c'est grâce à ça qu'on va devenir riches.</span></p></div></body></tt>

After translating the text, the document is modified. It includes translation text, and in this case the original text is preserved. The main document'sdefault language is changed to indicate that the focus is on the translated language. The combination of thexml:lang anddaptm:langSrc attributes are used to mark the text as being original or translated. In this case, they are present on both the<tt> and<p> elements to make the example easier to read, but it would also be possible to omit them in some cases, making use of the inheritance model:

Example 10
<ttxmlns="http://www.w3.org/ns/ttml"xmlns:ttm="http://www.w3.org/ns/ttml#metadata"xmlns:ttp="http://www.w3.org/ns/ttml#parameter"xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"xml:lang="en"daptm:langSrc="fr"daptm:scriptRepresents="audio.dialogue"daptm:scriptType="translatedTranscript"><head><metadata><ttm:agenttype="character"xml:id="character_1"><ttm:nametype="alias">ASSANE</ttm:name></ttm:agent></metadata></head><body><divbegin="10s"end="13s"xml:id="d1"ttm:agent="character_1"daptm:represents="audio.dialogue"><pxml:lang="fr"daptm:langSrc="fr"><!-- original --><span>Et c'est grâce à ça qu'on va devenir riches.</span></p><pxml:lang="en"daptm:langSrc="fr"><!-- translated --><span>And thanks to that, we're gonna get rich.</span></p></div></body></tt>

The process of adaptation, before recording, could adjust the wording and/or add further timing to assist in the recording. Thedaptm:scriptType attribute is also modified, as in the following example:

Example 11
<ttxmlns="http://www.w3.org/ns/ttml"xmlns:ttm="http://www.w3.org/ns/ttml#metadata"xmlns:ttp="http://www.w3.org/ns/ttml#parameter"xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"xml:lang="en"daptm:langSrc="fr"daptm:scriptRepresents="audio.dialogue"daptm:scriptType="preRecording"><head><metadata><ttm:agenttype="character"xml:id="character_1"><ttm:nametype="alias">ASSANE</ttm:name></ttm:agent></metadata></head><body><divbegin="10s"end="13s"xml:id="d1"ttm:agent="character_1"daptm:onScreen="ON_OFF"daptm:represents="audio.dialogue"><pxml:lang="fr"daptm:langSrc="fr"><span>Et c'est grâce à ça qu'on va devenir riches.</span></p><pxml:lang="en"daptm:langSrc="fr"><spanbegin="0s">And thanks to that,</span><spanbegin="1.5s"> we're gonna get rich.</span></p></div></body></tt>

3.Documentation Conventions

This document uses the following conventions:

4.DAPT Data Model and correspondingTTML syntax

This section specifies the data model forDAPT and its correspondingTTML syntax. In the model, there are objects which can have properties and be associated with other objects. In theTTML syntax, these objects and properties are expressed as elements and attributes, though it is not always the case that objects are expressed as elements and properties as attributes.

Figure1 illustrates theDAPT data model, hyperlinking every object and property to its corresponding section in this document.Shared properties are shown in italics. All other conventions in the diagram are as per [uml].

DAPT ScriptScript RepresentsScript TypeDefault Language(optional) Text Language SourceCharacterCharacter IdentifierName(optional) Talent NameScript EventScript Event IdentifierRepresents(optional) Begin(optional) End(optional) Duration(optional) On ScreenScript Event DescriptionDescription(optional) Description Type(optional) LanguageTextText content(optional) Text Language Source(optional) LanguageAudioSynthesized AudioRate(optional) PitchAudio RecordingSource [ ]Type [ ](optional) Begin(optional) End(optional) Duration(optional) In Time(optional) Out TimeMixing Instruction(optional) Gain(optional) Pan(optional) Begin(optional) End(optional) Duration(optional) Fillcontains 0..*contains0..*contains 0..*contains 0..*contains 0..*0..*0..*contains 0..*contains 0..*contains0..*is is 
Figure1 (Informative) Class diagram showing main entities in theDAPT data model.
Issue 116: Add non-inlined embedded audio resources to the Data Model?questionPR-must-have

See also#115 - if we are going to support non-inline embedded audio resources, should we make an object for them and add it into the Data Model?

4.1DAPT Script

ADAPT Script is atranscript orscript that corresponds to a document processed within an authoring workflow or processed by a client, and conforms to the constraints of this specification. It has properties and objects defined in the following sections:Script Represents,Script Type,Default Language,Text Language Source,Script Events and, forDubbing Scripts,Characters.

ADAPT Document is a [TTML2]timed text content document instance representing aDAPT Script. ADAPT Document has the structure and constraints defined in this and the following sections.

Note

A [TTML2]timed text content document instance has a root<tt> element in the TT namespace.

4.1.1Script Represents

TheScript Represents property is a mandatory property of aDAPT Script which indicates which components of therelated media object the contents of the document represent. The contents of the document could be used as part of a mechanism to provide an accessible alternative for those components.

Note

Script Events have a related property,Represents, and there are constraints about the permitted values of that property that are dependent on the values ofScript Represents.

To represent this property, thedaptm:scriptRepresents attributeMUST be present on the<tt> element, with a value conforming to the following syntax:

daptm:scriptRepresents: <content-descriptor> ( <lwsp>+ <content-descriptor>)*<lwsp>                # as TTML2
Example 12

A dubbing script might havedaptm:scriptRepresents="audio.dialogue".

An audio description script might havedaptm:scriptRepresents="visual.nonText visual.text visual.dialogue".

A post-production script that could be the precursor to a hard of hearing subtitle document might havedaptm:scriptRepresents="audio.dialogue audio.nonDialogueSounds".

4.1.2Default Language

TheDefault Language is a mandatory property of aDAPT Script which represents the default language for theText content ofScript Events. This language may be one of the original languages or aTranslation language. When it represents aTranslation language, it may be the final language for which a dubbing or audio descriptionscript is being prepared, called theTarget Recording Language or it may be an intermediate, or pivot, language used in the workflow.

TheDefault Language is represented in aDAPT Document by the following structure and constraints:

  • thexml:lang attributeMUST be present on the<tt> element and its valueMUST NOT be empty.

Note

All text content in aDAPT Script has a specified language. When multiple languages are used, theDefault Language can correspond to the language of the majority ofScript Events, to the language being spoken for the longest duration, or to a language arbitrarily chosen by the author.

Example 13
AnOriginal Language Transcript of dialogue is prepared for a video containing dialogue in Danish and Swedish. TheDefault Language is set to Danish by settingxml:lang="da" on the<tt> element.Script Events that contain SwedishText override this by settingxml:lang="sv" on the<p> element.Script Events that contain DanishText can set thexml:lang attribute or omit it, since the inherited language is theDefault Language of the document. In both cases the Script Events'Text objects are<p> elements that represent untranslated content that had an inherent language (in this case dialogue) and therefore set thedaptm:langSrc attribute to their source language, implying that they are in theOriginal language.

4.1.3Script Type

TheScript Type property is a mandatory property of aDAPT Script which describes the type of documents used in Dubbing and Audio Description workflows, among the following:Original Language Transcript,Translated Transcript,Pre-recording Script,As-recorded Script.

To represent this property, thedaptm:scriptType attributeMUST be present on the<tt> element:

daptm:scriptType  : "originalTranscript"  | "translatedTranscript"  | "preRecording"  | "asRecorded"

The definitions of the types of documents and the correspondingdaptm:scriptType attribute values are:

Editor's note

The following example is orphaned - move to the top of the section, before the enumerated script types?

Example 17
<ttdaptm:scriptType="originalTranscript">...</tt>

4.1.4Script Events

ADAPT ScriptMAY contain zero or moreScript Event objects, each corresponding to dialogue, on screen text, or descriptions for a given time interval.

If anyScript Events are present, theDAPT DocumentMUST have one<body> element child of the<tt> element.

4.1.5Characters

ADAPT ScriptMAY contain zero or moreCharacter objects, each describing a character that can be referenced by aScript Event.

If anyCharacter objects are present, theDAPT DocumentMUST have one<head> element child of the<tt> element, and that<head> elementMUST have at least one<metadata> element child.

Note

4.2Character recommends that all theCharacter objects be located within a single<metadata> element parent, and in the case that there are more than one<metadata> element children of the<head> element, that theCharacter objects are located in the first such child.

4.1.6Shared properties and Value Sets

Some of the properties in theDAPT data model are common within more than one object type, and carry the same semantic everywhere they occur. Theseshared properties are listed in this section.

Some of the value sets inDAPT are reused across more than one property, and have the same constraints everywhere they occur. These shared value sets are also listed in this section.

Editor's note

Would it be better to make a "Timed Object" class and subclass Script Event, Mixing Instruction and Audio Recording from it?

4.1.6.1Timing Properties

The followingtiming properties define when the entities that contain them are active:

  • TheBegin property defines when an object becomes active, and is relative to the active begin time of the parent object.DAPT Scripts begin at time zero on the media timeline.
  • TheEnd property defines when an object stops being active, and is relative to the active begin time of the parent object.
  • TheDuration property defines the maximum duration of an object.
    Note

    If both anEnd and aDuration property are present, the end time is the earlier ofEnd andBegin +Duration, as defined by [TTML2].

Note
If any of thetiming properties is omitted, the following rules apply, paraphrasing the timing semantics defined in [TTML2]:
  • The default value forBegin is zero, i.e. the same as the begin time of the parent object.
  • The default value forEnd is indefinite, i.e. it resolves to the same as the end time of the parent timed object, if there is one.
  • The default value forDuration is indefinite, i.e. the end time resolves to the same as the end time of the parent object.
Note

The end time of aDAPT Script is for practical purposes the end of theRelated Media Object.

4.1.6.2<content-descriptor> values

The values permitted in theScript Represents andRepresents properties depend on the<content-descriptor> syntactic definition and its associated registry table.

<content-descriptor> has a value conforming to the following syntax:

<content-descriptor>  # see registry table below: <descriptor-token> ( <descriptor-delimiter> <descriptor-token> )*<descriptor-token>: (descriptorTokenChar)+descriptorTokenChar  #xsd:NMtoken without the ".":NameStartChar | "-" | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]<descriptor-delimiter>: "."  # FULL STOP U+002E

<content-descriptor> has values that are delimiter separated ordered lists of tokens.

A<content-descriptor> valueB is acontent descriptorsub-type (sub-type) of another<content-descriptor> valueA if A's ordered list ofdescriptor-tokens is present at the beginning of B's ordered list ofdescriptor-tokens.

Example 18
Table demonstrating example values of<content-descriptor> and whether each is asub-type of the other.
<content-descriptor> A<content-descriptor> BIs B asub-type of A?
visual.textvisualNo
visual.textvisual.textYes
visual.textvisual.text.locationyes

For example, in this table, A could be one of the values listed inScript Represents property, and B could be the value of aRepresents property.

The permitted values for<content-descriptor> are either those listed in the followingregistry table, or can be user-defined.

Valid user-defined valuesMUST begin withx- or besub-types of values in thecontent-descriptor registry table, where the first additional<descriptor-token> component begins withx-.

Registry table for the<content-descriptor> component whose Registry Definition is atH.2.2<content-descriptor> registry table definition
<content-descriptor>StatusDescriptionExample usage
audioProvisionalIndicates that theDAPT content represents any part of the audio programme.Dubbing, translation and hard of hearing subtitles and captions, pre- and post- production scripts
audio.dialogueProvisionalIndicates that theDAPT content represents verbal communication in the audio programme, for example, a spoken conversation.Dubbing, translation and hard of hearing subtitles and captions, pre- and post- production scripts
audio.nonDialogueSoundsProvisionalIndicates that theDAPT content represents a part of the audio programme corresponding to sounds that are not verbal communication, for example, significant sounds, such as a door being slammed in anger.Translation and hard of hearing subtitles and captions, pre- and post- production scripts
visualProvisionalIndicates that theDAPT content represents any part of the visual image of the programme.Audio Description
visual.dialogueProvisionalIndicates that theDAPT content represents verbal communication, within the visual image of the programme, for example, a signed conversation.Dubbing or Audio Description, translation and hard of hearing subtitles and captions, pre- and post- production scripts
visual.nonTextProvisionalIndicates that theDAPT content represents non-textual parts of the visual image of the programme, for example, a significant object in the scene.Audio Description
visual.textProvisionalIndicates that theDAPT content represents textual content in the visual image of the programme, for example, a signpost, a clock, a newspaper headline, an instant message etc.Audio Description
visual.text.titleProvisionalAsub-type ofvisual.text where the text is the title of the related media.Audio Description
visual.text.creditProvisionalAsub-type ofvisual.text where the text is a credit, e.g. the name of an actor.Audio Description
visual.text.locationProvisionalAsub-type ofvisual.text where the text indicates the location where the content is occurring.Audio Description
4.1.6.3Unique identifiers

Some entities in the data model include unique identifiers. AUnique Identifier has the following requirements:

  • it is unique within theDAPT Script, i.e. the value of aUnique Identifier can only be used one time within the document, regardless of which specific kind of identifier it is.

    If aCharacter Identifier has the value"abc" and aScript Event Identifier in the same document has the same value, that is an error.

  • its value has to conform to the requirements ofName as defined by [XML]

    Note

    It cannot begin with a digit, a combining diacritical mark (an accent), or any of the following characters:

        .    -    ·// #xB7// #x203F// #x2040

    but those characters can be used elsewhere.

AUnique Identifier for an entity is expressed in aDAPT Document by anxml:id attribute on the corresponding element.

Note

The formal requirements for the semantics and processing ofxml:id are defined in [xml-id].

4.2Character

This section is mainly relevant toDubbing workflows.

A character in the programme can be described using aCharacter object which has the following properties:

ACharacter is represented in aDAPT Document by the following structure and constraints:

Note
As indicated in5.2.1Unrecognised vocabulary,ttm:agent elements can have foreign attributes and elements. This can be used to provide additional, proprietary character information.
Issue 44: Define DAPT-specific conformant implementation typesCR must-have

We should define our own classes of conformant implementation types, to avoid using the generic "presentation processor" or "transformation processor" ones. We could link to them.
At the moment, I can think of the following classes:

  • DAPT Authoring Tool: tool that produces compliant DAPT documents or consumes DAPT compliant document. I don't think they map to TTML2 processors.
  • DAPT Audio Recorder/Renderer: tool that takes DAPT Audio Description scripts, e.g. with mixing instruction, and produces audio output, e.g. a WAVE file. I think it is a "presentation processor"
  • DAPT Validator: tool that verify that a DAPT document is compliant to the specification. I'm not sure what it maps to in TTML2 terminology.

4.3Script Event

AScript Event object represents dialogue, on screen text or audio descriptions to be spoken and has the following properties:

AScript Event is represented in aDAPT Document at the path/tt/head/body//div, with the following structure and constraints:

Issue 233: Consider improving identification of divs corresponding to script eventsCR must-have

Based on discussion at#216 (comment), I think we should have an explicit signal to indicate when a div represents a Script Event.

4.4Text

TheText object contains text content typically in a single language. This language may be theOriginal language or aTranslation language.

Text is defined asOriginal if it is any of:

  1. the same language as the dialogue that it represents in the original programme audio;
  2. a transcription of text visible in the programme video, in the same language as that text;
  3. an untranslated representative of non-dialogue sound;
  4. an untranslated description of the scene in the programme video.

Note
The language of anOriginalText object can be different to the document'sDefault Language.

Text is defined asTranslation if it is a representation of anOriginalText object in a different language.

Text can beidentified as beingOriginal orTranslation by inspecting its language and itsText Language Source together, according to the semantics defined inText Language Source.

The source language ofTranslationText objects and, where applicable,OriginalText objects is indicated using theText Language Source property.

AText object may be styled.

Zero or moreMixing Instruction objects used to modify the programme audio during theTextMAY be present.

AText object is represented in aDAPT Document by a<p> element at the path/tt/head/body//div/p, with the following constraints:

4.5Text Language Source

TheText Language Source property is an annotation indicating the source language of aText object, if applicable, or that the source content had no inherent language:

Text Language Source is an inheritable property.

TheText Language Source property is represented in aDAPT Document by adaptm:langSrc attribute with the following syntax, constraints and semantics:

daptm:langSrc: <empty-string> | <language-identifier><empty-string>: ""                    # default<language-identifier>   # valid BCP-47 language tag
Note

An example of the usage ofText Language Source in a document is present in theText section.

Example 24
Table enumerating example values of thexml:lang anddaptm:langSrc attributes for differentOriginal transcript sources and their inherent languages.
Transcript sourceInherent language of the transcript sourcexml:langdaptm:langSrc
In-image textEnglishenen
Video image (non text)noneenempty
Sound effectnoneenempty
DialogueArabicarar

If any of these transcripts were translated, the resultingText would have itsdaptm:langSrc attribute set to the computed value of thexml:lang attribute of the source.

For example, if the Arabic dialogue were translated into Japanese, it would result inxml:lang="ja" anddaptm:langSrc="ar".

4.6On Screen

TheOn Screen property is an annotation indicating the position in the scene relating to the subject of aScript Event, for example of thecharacter speaking:

If omitted, the default value is "ON".

Note
When thedaptm:represents attribute value begins withvisual the subject of eachScript Event, i.e. what is being described, is expected to be in the video image, therefore the default of "ON" allows the property to be omitted in those cases without distortion of meaning.

TheOn Screen property is represented in aDAPT Document by adaptm:onScreen attribute on the<div> element, with the following constraints:

4.7Represents

TheRepresents property indicates which component of therelated media object theScript Event represents.

TheRepresents property is represented in aDAPT Document by adaptm:represents attribute, whose valueMUST be a single<content-descriptor>.

TheRepresents property is inheritable. If it is absent from an element then its computed value is the computed value of theRepresents property on its parent element, or, if it has no parent element, it is the empty string. If it is present on an element then its computed value is the value specified.

Note

Since there is no empty <content-descriptor>, this implies that an empty computedRepresents property can never be valid; one way to construct a validDAPT Document is to specify aRepresents property on theDAPT Script so that it is inherited by all descendants that do not have aRepresents property.

It is an error for aRepresents property value not to be acontent descriptorsub-type of at least one of the values in theScript Represents property.

4.8Script Event Description

TheScript Event Description object is an annotation providing a human-readable description of some aspect of the content of aScript Event. Script Event Descriptions can themselves be classified with aDescription Type.

AScript Event Description object is represented in aDAPT Document by a<ttm:desc> element at the<div> element level.

Zero or more<ttm:desc> elementsMAY be present.

Script Event DescriptionsSHOULD NOT be empty.

Note

TheScript Event Description does not need to be unique, i.e. it does not need to have a different value for eachScript Event. For example a particular value could be re-used to identify in a human-readable way one or moreScript Events that are intended to be processed together, e.g. in a batch recording.

The<ttm:desc> elementMAY specify its language using thexml:lang attribute.

Note
In the absence of anxml:lang attribute the language of theScript Event Description is inherited from the parentScript Event object.
Example 25
...<body><divbegin="10s"end="13s"xml:id="a1"><ttm:desc>Scene 1</ttm:desc><pxml:lang="en"><span>A woman climbs into a small sailing boat.</span></p><pxml:lang="fr"daptm:langSrc="en"><span>Une femme monte à bord d'un petit bateau à voile.</span></p></div><divbegin="18s"end="20s"xml:id="a2"><ttm:desc>Scene 1</ttm:desc><pxml:lang="en"><span>The woman pulls the tiller and the boat turns.</span></p><pxml:lang="fr"daptm:langSrc="en"><span>La femme tire sur la barre et le bateau tourne.</span></p></div></body>...

EachScript Event Description can be annotated with one or moreDescription Types to categorise further the purpose of the Script Event Description.

EachDescription Type is represented in aDAPT Document by adaptm:descType attribute on the<ttm:desc> element.

The<ttm:desc> elementMAY have zero or onedaptm:descType attributes. Thedaptm:descType attribute is defined below.

daptm:descType : string

The permitted values fordaptm:descType are either those listed in the followingregistry table, or can be user-defined:

Registry table for thedaptm:descType attribute whose Registry Definition is atH.2.1daptm:descType registry table definition
daptm:descTypeStatusDescriptionNotes
pronunciationNoteProvisionalNotes for how to pronounce the content.
sceneProvisionalContains a scene identifier
plotSignificanceProvisionalDefines a measure of how significant the content is to the plot.Contents are undefined and may be low, medium or high, or a numerical scale.

Valid user-defined valuesMUST begin withx-.

Example 26
...<body><divbegin="10s"end="13s"xml:id="a123"><ttm:descdaptm:descType="pronunciationNote">[oːnʲ]</ttm:desc><p>Eóin looks around at the other assembly members.</p></div></body>...

Amongst a sibling group of<ttm:desc> elements there are no constraints on the uniqueness of thedaptm:descType attribute, however it may be useful as a distinguisher as shown in the following example.

Example 27
...<body><divbegin="10s"end="13s"xml:id="a1"><ttm:descdaptm:descType="scene">Scene 1</ttm:desc><ttm:descdaptm:descType="plotSignificance">High</ttm:desc><pxml:lang="en"><span>A woman climbs into a small sailing boat.</span></p><pxml:lang="fr"daptm:langSrc="en"><span>Une femme monte à bord d'un petit bateau à voile.</span></p></div><divbegin="18s"end="20s"xml:id="a2"><ttm:descdaptm:descType="scene">Scene 1</ttm:desc><ttm:descdaptm:descType="plotSignificance">Low</ttm:desc><pxml:lang="en"><span>The woman pulls the tiller and the boat turns.</span></p><pxml:lang="fr"daptm:langSrc="en"><span>La femme tire sur la barre et le bateau tourne.</span></p></div></body>...

4.9Audio

AnAudio object is used to specify an audio rendering of aText. The audio rendering can either be a recorded audio resource, as anAudio Recording object, or a directive to synthesize a rendering of the text via a text to speech engine, which is aSynthesized Audio object. Both are types ofAudio object.

It is an error for anAudio not to be in the same language as itsText.

Apresentation processor that supports audio plays or inserts theAudio at the specified time on therelated media object's timeline.

Note

TheAudio object is "abstract": it only can exist as one of its sub-types,Audio Recording orSynthesized Audio.

4.9.1Audio Recording

AnAudio Recording is anAudio object that references an audio resource. It has the following properties:

  • One or more alternativeSources, each of which is either 1) a link to an external audio resource or 2) an embedded audio recording;
  • For eachSource, one mandatoryType that specifies the type ([MIME-TYPES]) of the audio resource, for exampleaudio/basic;
  • An optionalBegin property and an optionalEnd and an optionalDuration property that together define theAudio Recording's time interval in the programme timeline, in relation to the parent element's time interval;
  • An optionalIn Time and an optionalOut Time property that together define a temporal subsection of the audio resource;

    The defaultIn Time is the beginning of the audio resource.

    The defaultOut Time is the end of the audio resource.

    If the temporal subsection of the audio resource is longer than the duration of theAudio Recording's time interval, then playbackMUST be truncated to end when theAudio Recording's time interval ends.

    Note
    "Extended descriptions" (known in [media-accessibility-reqs] as "Extended video descriptions") are longer than the allocated time within the related media. Apresentation processor that supports extended descriptions can allow the effective play rate of the audio resource to differ from the play rate of the related media object so that the resulting interval has a long enough duration to accommodate the audio resource's temporal subsection. For example it could pause or slow down playback of therelated media object while continuing playback of the audio resource, or it could speed up playback of the audio resource, so that theAudio Recording's time interval does not end before the audio resource's temporal subsection. This behaviour is currently unspecified and therefore implementation-defined.

    If the temporal subsection of the audio resource is shorter than the duration of theAudio Recording's time interval, then the audio resource plays once.

  • Zero or moreMixing Instructions that modify the playback characteristics of theAudio Recording.

When a list ofSources is provided, apresentation processorMUST play no more than one of theSources for eachAudio Recording.

This feature may contribute to browser fingerprintability. Implementations can use theType, and if present, any relevant additional formatting information, to decide whichSource to play. For example, given twoSources, one being a WAV file, and the other an MP3, an implementation that can play only one of those formats, or is configured to have a preference for one or the other, would select the playable or preferred version.

AnAudio Recording is represented in aDAPT Document by an<audio> element child of a<p> or<span> element corresponding to theText to which it applies. The following constraints apply to the<audio> element:

  • Thebegin,end anddur attributes represent respectively theBegin,End andDuration properties;
  • TheclipBegin andclipEnd attributes represent respectively theIn Time andOut Time properties, as illustrated byExample 5;
  • For eachSource, if it is a link to an external audio resource, theSource andType properties are represented by exactly one of:
    1. Asrc attribute that is not a fragment identifier, and atype attribute respectively;

      This mechanism cannot be used if there is more than oneSource.

      <audio src="https://example.com/audio.wav" type="audio/wave"/>
    2. A<source> child element with asrc attribute that is not a fragment identifier and atype attribute respectively;
      <audio>  <source src="https://example.com/audio.wav" type="audio/wave"/>  <source src="https://example.com/audio.aac" type="audio/aac"/></audio>

    Asrc attribute that is not a fragment identifier is a URL that references an external audio resource, i.e. one that is not embedded within theDAPT Script. No validation that the resource can be located is specified inDAPT.

    Issue 113: Support both `@src` and `<source>` child of `<audio>` (external resources)?questionPR-must-have
              While working on the specification for adding audio recordings I reminded myself of the various ways in which an audio recording can be embedded and referenced, of which there are at least 5 in total. Requirement R15 of [DAPT](https://www.w3.org/TR/dapt-reqs/#requirements) is clear that both referenced and embedded options need to be available, but should we be syntactically restricting the options for each? Will raise as separate issues.

    Originally posted by@nigelmegitt in#105 (comment)

    The following two options exist in TTML2 for referencingexternal audio resources:

    1. src attribute in<audio> element.
    <audio src="https://example.com/audio_recording.wav" type="audio/wave"/>
    1. <source> element child of<audio> element.
    <audio>    <source src="https://example.com/audio_recording.wav" type="audio/wave"/></audio>

    This second option has an additional possibility of specifying aformat attribute in casetype is inadequate. It also permitsmultiple<source> child elements, and we specify that in this case the implementation must choose no more than one.

    [Edited 2023-03-29 to account for the "play no more than one" constraint added after the issue was opened]

    Issue 218: At-risk: support for `src` attribute in `<audio>` for external resourcePR-must-have

    Possible resolution to#113.

    Issue 219: At-risk: support for `<source>` element child of `<audio>` for external resourcePR-must-have

    Possible resolution to#113.

  • If theSource is an embedded audio resource, theSource andType properties are represented together by exactly one of:
    1. Asrc attribute that is a fragment identifier that references either an<audio> element or a<data> element, where the referenced element is a child of/tt/head/resources and specifies atype attribute and thexml:id attribute used to reference it;

      This mechanism cannot be used if there is more than oneSource.

      <tt><head><resources><datatype="audio/wave"xml:id="audio1">        [base64-encoded WAV audio resource]</data></resources></head><body>    ..<audiosrc="#audio1"/>    ..</body></tt>
    2. A<source> child element with asrc attribute that is a fragment identifier that references either an<audio> element or a<data> element, where the referenced element is a child of/tt/head/resources and specifies atype attribute and thexml:id attribute used to reference it;
      <tt><head><resources><datatype="audio/wave"xml:id="audio1wav">        [base64-encoded WAV audio resource]</data><datatype="audio/mpeg"xml:id="audio1mp3">        [base64-encoded MP3 audio resource]</data></resources></head><body>    ..<audio><sourcesrc="#audio1wav"/><sourcesrc="#audio1mp3"/></audio>    ..</body></tt>
    3. A<source> child element with a<data> element child that specifies atype attribute and contains the audio recording data.
      <audio><source><datatype="audio/wave">        [base64-encoded WAV audio resource]</data></source></audio>

    In each of the cases above thetype attribute represents theType property.

    Asrc attribute that is a fragment identifier is a pointer to an audio resource that is embedded within theDAPT Script

    If<data> elements are defined, each oneMUST contain either#PCDATA or<chunk> child elements andMUST NOT contain any<source> child elements.

    <data> and<source> elementsMAY contain aformat attribute whose value implementationsMAY use in addition to thetype attribute value when selecting an appropriate audio resource.

    Editor's note

    Do we need all 3 mechanisms here? Do we need any? There may be a use case for embedding audio data, since it makes the single document a portable (though large) entity that can be exchanged and transferred with no concern for missing resources, and no need for e.g. manifest files. If we do not need to support referenced embedded audio then only the last option is needed, and is probably the simplest to implement. One case for referenced embedded audio is that it more easily allows reuse of the same audio in different document locations, though that seems like an unlikely requirement in this use case. Another is that it means that all embedded audio is in an easily located part of the document intt/head/resources, which potentially could carry an implementation benefit? Consider marking the embedded data features as "at risk"?

    Issue 114: Support both `@src` and `<source>` child of `<audio>` (embedded resources)?questionPR-must-have
              While working on the specification for adding audio recordings I reminded myself of the various ways in which an audio recording can be embedded and referenced, of which there are at least 5 in total. Requirement R15 of [DAPT](https://www.w3.org/TR/dapt-reqs/#requirements) is clear that both referenced and embedded options need to be available, but should we be syntactically restricting the options for each? Will raise as separate issues.

    Originally posted by@nigelmegitt in#105 (comment)

    Given some embedded audio resources:

    <head>  <resources>    <audio xml:id="audioRecording1" type="audio/wave">      <source>        <data>[base64 encoded audio data]</data>      </source>    </audio>    <data xml:id="audioRecording2" type="audio/wave">      [base64 encoded audio data]    </data>  </resources></head>

    The following two options exist in TTML2 for referencingembedded audio resources:

    1. src attribute in<audio> element referencing embedded<audio> or<data>:
    <audio src="#audioRecording1"/>...<audio src="#audioRecording2"/>
    1. <source> element child of<audio> element.
    <audio>    <source src="#audioRecording1"/></audio>

    This second option has an additional possibility of specifying aformat attribute in casetype is inadequate. It also permitsmultiple<source> child elements, though it is unclear what the semantic is intended to be if multiple resources are specified - presumably, the implementation gets to choose one somehow.

    Issue 115: Support both referenced and inline embedded audio recordings?questionPR-must-have
              While working on the specification for adding audio recordings I reminded myself of the various ways in which an audio recording can be embedded and referenced, of which there are at least 5 in total. Requirement R15 of [DAPT](https://www.w3.org/TR/dapt-reqs/#requirements) is clear that both referenced and embedded options need to be available, but should we be syntactically restricting the options for each? Will raise as separate issues.

    Originally posted by@nigelmegitt in#105 (comment)

    If we are going to support embedded audio resources, they can either be defined in/tt/head/resources and then referenced, or the data can be included inline.

    Do we need both options?

    Example of embedded:

    <head>  <resources>    <audio xml:id="audioRecording1" type="audio/wave">      <source>        <data>[base64 encoded audio data]</data>      </source>    </audio>    <data xml:id="audioRecording2" type="audio/wave">      [base64 encoded audio data]    </data>  </resources></head>

    This would then be referenced in the body content using something like (see also#114):

    <audio src="#audioRecording2"/>

    Example of inline:

    <audio type="audio/wave">  <source type="audio/wave">    <data>[base64 encoded audio data]</data>  </source></audio>
    Issue 220: At-risk: support for `src` attribute of `<audio>` element pointing to embedded resourcePR-must-have

    Possible resolution to#114 and#115.

    The link to#115 is that this implies the existence of some referenceable embedded audio resource too, which one of the options described in#115.

    Issue 221: At-risk: support for `<source>` child of `<audio>` element pointing to embedded resourcePR-must-have

    Possible resolution to#114 and#115.

    The link to#115 is that this implies the existence of some referenceable embedded audio resource too, which one of the options described in#115.

    Issue 222: At-risk: support for inline audio resourcesPR-must-have

    Possible resolution to#115.

    Issue 116: Add non-inlined embedded audio resources to the Data Model?questionPR-must-have

    See also#115 - if we are going to support non-inline embedded audio resources, should we make an object for them and add it into the Data Model?

    Issue 117: Embedded data: Do we need to support all the permitted encodings? What about length?questionPR-must-have

    In TTML2's<data> element, anencoding can be specified, being one of:

    • base16
    • base32
    • base32hex
    • base64
    • base64url

    Do we need to require processor support for all of them, or will the defaultbase64 be adequate?

    Also, it is possible to specify alength attribute that provides some feasibility of error checking, since the decoded data must be the specified length in bytes. Is requiring support for this a net benefit? Would it be used?

    Issue 223: At-risk: each of the potential values of `encoding` in `<data>`PR-must-have

    Possible resolution to#117.

    Issue 224: At-risk: support for the `length` attribute on `<data>`PR-must-have

    Possible resolution to#117.

  • Mixing InstructionsMAY be applied as specified in theirTTML representation;
  • The computed value of thexml:lang attributeMUST be identical to the computed value of thexml:lang attribute of the parent element and any child<source> elements and any referenced embedded<data> elements.

4.9.2Synthesized Audio

ASynthesized Audio is anAudio object that represents a machine generated audio rendering of the parentText content. It has the following properties:

  • A mandatoryRate that specifies the rate of speech, beingnormal,fast orslow;
  • An optionalPitch that allows adjustment of the pitch of the speech.

ASynthesized Audio is represented in aDAPT Document by the application of atta:speak style attribute on the element representing theText object to be spoken, where the computed value of the attribute isnormal,fast orslow. This attribute also represents theRate Property.

Thetta:pitch style attribute represents thePitch property.

TheTTML representation of aSynthesized Audio is illustrated byExample 6.

Note

Atta:pitch attribute on an element whose computed value of thetta:rate attribute isnone has no effect. Such an element is not considered to have an associatedSynthesized Audio.

Note

The semantics of theSynthesized Audio vocabulary ofDAPT are derived from equivalent features in [SSML] as indicated in [TTML2]. This version of the specification does not specify how other features of [SSML] can be either generated fromDAPT or embedded intoDAPT documents. The option to extend [SSML] support in future versions of this specification is deliberately left open.

4.10Mixing Instruction

AMixing Instruction object is a static or animated adjustment of the audio relating to the containing object. It has the following properties:

AMixing Instruction is represented by applying audio style attributes to the element that corresponds to the relevant object, either inline, by reference to a<style> element, or in a child (inline)<animate> element:

If theMixing Instruction is animated, that is, if the adjustment properties change during the containing object's active time interval, then it is represented by one or more child<animate> elements. This representation is required if more than oneGain orPan property is needed, or if anytiming properties are needed.

The<animate> element(s)MUST be children of the element corresponding to the containing object, and have the following constraints:

TheTTML representation of animatedMixing Instructions is illustrated byExample 4.

See alsoE.Audio Mixing.

5.Constraints

5.1Document Encoding

ADAPT DocumentMUST be serialised as a well-formed XML 1.0 [xml] document encoded using the UTF-8 character encoding as specified in [UNICODE].

The resulting [xml] documentMUST NOT contain any of the following physical structures:

Note

The resulting [xml] document can containcharacter references, andentity references topredefined entities.

The predefined entities are (including the leading ampersand and trailing semicolon):

  • &amp; for an ampersand & (unicode code point U+0026)
  • &apos; for an apostrophe ' (unicode code point U+0027)
  • &gt; for a greater than sign > (unicode code point U+003E)
  • &lt; for a less than sign < (unicode code point U+003C)
  • &quot; for a quote symbol " (unicode code point U+0022)
Note

ADAPT Document can also be used as an in-memory model for processing, in which case the serialisation requirements do not apply.

5.2Processing of unrecognised or foreign elements and attributes

The requirements in this section are intended to facilitate forwards and backwards compatibility, specifically to permit:

ADAPT document that conforms to more than one version of the specification could specify conformance to multipleDAPTcontent profiles.

5.2.1Unrecognised vocabulary

Unrecognised vocabulary is the set of elements and attributes that are not associated withfeatures that the processor supports.

Atransformation processorMUST pruneunrecognised vocabulary that is neither an attribute nor a descendant of a<metadata> element.

Atransformation processorSHOULD preserveunrecognised vocabulary that is either an attribute or a descendant of a<metadata> element.

Note

See also5.6.2ttp:contentProfiles which prohibits the signalling of profile conformance to profiles that thetransformation processor does not support.

Afterattribute value computation, apresentation processorSHOULD ignoreunrecognised vocabulary.

Note

The above constraint is specified as being after attribute value computation because it is possible that an implementation recognises and supports attributes present only on particular elements, for example those corresponding to theDAPT data model. As described in6.4Using computed attribute values it is important that processor implementations do not ignore such attributes when present on other elements.

5.2.2Special considerations for foreign vocabulary

Foreign vocabulary is the subset ofunrecognised vocabulary that consists of those elements and attributes whose namespace is not one of the namespaces listed in5.3Namespaces and those attributes whose namespace has no value that are not otherwise defined inDAPT or in [TTML2].

ADAPT DocumentMAY containforeign vocabulary that is neither specifically permitted nor forbidden by the profiles signalled inttp:contentProfiles.

Note

For validation purposes it is good practice to define and use a specification for allforeign vocabulary used within aDAPT Document, for example acontent profile.

5.2.3Proprietary Metadata and Foreign Vocabulary

Many dubbing andaudio description workflows permit annotation ofScript Events or documents with proprietary metadata. Metadata vocabulary defined in this specification or in [TTML2]MAY be included.Foreign vocabularyMAY also be included, either as attributes of<metadata> elements or as descendant elements of<metadata> elements.

Note

It is possible to add information such as the title of the programme using [TTML2] constructs.

...<head><metadata><ttm:title>A example document title</ttm:title></metadata></head>...
Note

It is possible to add workflow-specific information using a foreign namespace. In the following example, a fictitious namespacevendorm from an "example vendor" is used to provide document-level information not defined byDAPT.

...<metadataxmlns:vendorm="http://www.example-vendor.com/ns/ttml#metadata"><vendorm:programType>Episode</vendorm:programType><vendorm:episodeSeason>5</vendorm:episodeSeason><vendorm:episodeNumber>8</vendorm:episodeNumber><vendorm:internalId>15734</vendorm:internalId><vendorm:information>Some proprietary information</vendorm:information></metadata>...
It is strongly recommended not to place data whose semantics depend on the contents of the document within<metadata> elements.

Such data can be invalidated bytransformation processors that modify the contents of the document but preserve metadata while being unaware of their semantics.

5.2.3.1Defining and using foreign vocabulary that is not metadata

This section is non-normative.

Ifforeign vocabulary is included in locations other than<metadata> elements it will be pruned bytransformation processors that do not support features associated with that vocabulary, as required in5.2.1Unrecognised vocabulary.

A mechanism is provided to prevent such pruning, and to define semantics for suchforeign vocabulary, allowing it to be located outside a<metadata> element without being pruned, and to indicate content and processor conformance:

  1. Define a profile including a feature definition for that semantic and vocabulary, with a profile designator.
  2. Signal document conformance to that profile using the5.6.2ttp:contentProfiles

This allows processors that support the feature to process the vocabulary in whatever way is appropriate, to avoid pruning it, and allows processors that do not support the feature to take appropriate action, for example warning users that some functionality may be lost.

5.3Namespaces

The following namespaces (see [xml-names]) are used in this specification:

NamePrefixValueDefining Specification
XMLxmlhttp://www.w3.org/XML/1998/namespace[xml-names]
TTtthttp://www.w3.org/ns/ttml[TTML2]
TT Parameterttphttp://www.w3.org/ns/ttml#parameter[TTML2]
TT Audio Stylettahttp://www.w3.org/ns/ttml#audio[TTML2]
TT Metadatattmhttp://www.w3.org/ns/ttml#metadata[TTML2]
TT Featurenonehttp://www.w3.org/ns/ttml/feature/[TTML2]
DAPT Metadatadaptmhttp://www.w3.org/ns/ttml/profile/dapt#metadataThis specification
DAPT Extensionnonehttp://www.w3.org/ns/ttml/profile/dapt/extension/This specification
EBU-TT Metadataebuttmurn:ebu:tt:metadata[EBU-TT-3390]

The namespace prefix values defined above are for convenience andDAPT DocumentsMAY use any prefix value that conforms to [xml-names].

The namespaces defined by this specification are mutable as described in [namespaceState]; all undefined names in these namespaces are reserved for future standardization by theW3C.

5.4Related Media Object

WithinDAPT, the common language terms audio and video are used in the context of a programme. The audio and video are each a part of what is defined in [TTML2] as theRelated Media Object that provides the media timeline and is the source of themain programme audio, and any visual timing references needed when adjusting timings relevant to the video image, such as for lip synchronization.

Note

ADAPT document can identify the programme acting as theRelated Media Object using metadata. For example, it is possible to use the<ebuttm:sourceMediaIdentifier> element defined in [EBU-TT-3390].

<ttxmlns="http://www.w3.org/ns/ttml"xmlns:ttp="http://www.w3.org/ns/ttml#parameter"xmlns:daptm="http://www.w3.org/ns/ttml/profile/dapt#metadata"xmlns:ebuttm="urn:ebu:tt:metadata"xml:lang="en"ttp:contentProfiles="http://www.w3.org/ns/ttml/profile/dapt1.0/content"daptm:scriptRepresents="audio.dialogue"daptm:scriptType="originalTranscript"><head><metadata><ebuttm:sourceMediaIdentifier>https://example.org/programme.mov</ebuttm:sourceMediaIdentifier></metadata></head><body>    ...</body></tt>

5.5Synchronization

If theDAPT Document is intended to be used as the basis for producing an [TTML-IMSC1.2] document, the synchronization provisions of [TTML-IMSC1.2] apply in relation to the video.

Timed content within theDAPT Document is intended to be rendered starting and ending on specific audio samples.

Note

In the context of this specification rendering could be visual presentation of text, for example to show an actor what words to speak, or could be audible playback of an audio resource, or could be physical or haptic, such as a Braille display.

In constrained applications, such as real-time audio mixing and playback, if accurate synchronization to the audio sample cannot be achieved in the rendered output, the combined effects of authoring and playback inaccuracies in timed changes in presentationSHOULD meet the synchronization requirements of [EBU-R37], i.e. audio changes are not to precede image changes by more than 40ms, and are not to follow them by more than 60ms.

Likewise, authoring applicationsSHOULD allow authors to meet the requirements of [EBU-R37] by defining times with an accuracy such that changes to audio are less than 15ms after any associated change in the video image, and less than 5ms before any associated change in the video image.

Taken together, the above two constraints on overall presentation and onDAPT documents intended for real-time playback mean thatcontent processorsSHOULD complete audio presentation changes no more than 35ms before the time specified in theDAPT document and no more than 45ms after the time specified.

5.6Profile Signaling

This section defines how aTTMLDocument Instance signals that it is aDAPT Document and how it signals any processing requirements that apply. See also7.1Conformance ofDAPT Documents, which defines how to establish that aDAPT Document conforms to this specification.

5.6.1Profile Designators

This profile is associated with the following profile designators:

Profile NameProfile typeProfile Designator
DAPT 1.0 Content Profilecontent profilehttp://www.w3.org/ns/ttml/profile/dapt1.0/content
DAPT 1.0 Processor Profileprocessor profilehttp://www.w3.org/ns/ttml/profile/dapt1.0/processor

5.6.2ttp:contentProfiles

Thettp:contentProfiles attribute is used to declare the [TTML2] profiles to which the document conforms.

DAPT DocumentsMUST specify attp:contentProfiles attribute on the<tt> element including at least one value equal to acontent profile designator specified at5.6.1Profile Designators. Other valuesMAY be present to declare conformance to other profiles of [TTML2], andMAY include profile designators in proprietary namespaces.

It is an error for aDAPT Document to signal conformance to acontent profile to which it does not conform.

Transformation processorsMUST NOT include values within thettp:contentProfiles attribute associated with profiles that they (the processors) do not support; by definition they cannot verify conformance of the content to those profiles.

5.6.3ttp:profile

Thettp:profile attribute is a mechanism within [TTML1] for declaring the processing requirements for aDocument Instance. It has effectively been superceded in [TTML2] byttp:processorProfiles.

DAPT DocumentsMUST NOT specify attp:profile attribute on the<tt> element.

5.6.4ttp:processorProfiles

Thettp:processorProfiles attribute is used to declare the processing requirements for aDocument Instance.

DAPT DocumentsMAY specify attp:processorProfiles attribute on the<tt> element. If present, thettp:processorProfiles attributeMUST include at least one value equal to aprocessor profile designator specified at5.6.1Profile Designators. Other valuesMAY be present to declare additional processing constraints, andMAY include profile designators in proprietary namespaces.

Note

Thettp:processorProfiles attribute can be used to signal that features and extensions in additional profiles need to be supported to process theDocument Instance successfully. For example, a local workflow might introduce particular metadata requirements, and signal that the processor needs to support those by using an additional processor profile designator.

Note

If the content author does not need to signal that additional processor requirements than those defined byDAPT are needed to process theDAPT document then thettp:processorProfiles attribute is not expected to be present.

5.6.5Other TTML2 Profile Vocabulary

[TTML2] specifies a vocabulary and semantics that can be used to define the set offeatures that adocument instance can make use of, or that a processor needs to support, known as aProfile.

Except where specified, it is not a requirement ofDAPT that this profile vocabulary is supported by processors; nevertheless such support is permitted.

The majority of this profile vocabulary is used to indicate how a processor can compute the set of features that it needs to support in order to process theDocument Instance successfully. The vocabulary is itself defined in terms of TTML2 features. Those profile-related features are listed withinF.Profiles as being optional. TheyMAY be implemented in processors and their associated vocabularyMAY be present inDAPT Documents.

Note

Unless processor support for these features and vocabulary has been arranged (using an out-of-band protocol), the vocabulary is not expected to be present.

The additional profile-related vocabulary for which processor support is not required (but is permitted) inDAPT is:

5.7Timing constraints

Within aDAPT Script, the following constraints apply in relation to time attributes and time expressions:

5.7.1ttp:timeBase

The only permittedttp:timeBase attribute value ismedia, sinceF.Profiles prohibits all timeBase features other than#timeBase-media.

This means that the beginning of the document timeline, i.e. time "zero", is the beginning of theRelated Media Object.

5.7.2timeContainer

The only permitted value of thetimeContainer attribute is the default value,par.

DocumentsSHOULD omit thetimeContainer attribute on all elements.

DocumentsMUST NOT set thetimeContainer attribute to any value other thanpar on any element.

Note

This means that thebegin attribute value for every timed element is relative to the computed begin time of its parent element, or for the<body> element, to time zero.

5.7.3ttp:frameRate

If the document contains any time expression that uses thef metric, or any time expression that contains a frames component, thettp:frameRate attributeMUST be present on the<tt> element.

Note
[TTML2] specifies thettp:frameRateMultiplier attribute for defining non-integer frame rates.

5.7.4ttp:tickRate

If the document contains any time expression that uses thet metric, thettp:tickRate attributeMUST be present on the<tt> element.

5.7.5Time expressions

All time expressions within a documentSHOULD use the same syntax, eitherclock-time oroffset-time as defined in [TTML2], withDAPT constraints applied.

Note

ADAPTclock-time has one of the forms:

wherehh is hours,mm is minutes,ss is seconds, andss.sss is seconds with a decimal fraction of seconds (any precision).

Note

Clock time expressions that use frame components, which look similar to "time code", are prohibited due to the semantic confusion that has been observed elsewhere when they are used, particularly with non-integer frame rates, "drop modes" and sub-frame rates.

Note

Anoffset-time has one of the forms:

wherenn is an integer,nn.nn is a number with a decimal fraction (any precision), andmetric is one of:

When mapping a media time expression M to a frame F of the video, e.g. for the purpose of accurately timing lip synchronization, thecontent processorSHOULD map M to the frame F with the presentation time that is the closest to, but not less, than M.

A media time expression of 00:00:05.1 corresponds to frameceiling( 5.1 × ( 1000 / 1001 × 30) ) = 153 of a video that has a frame rate of1000 / 1001 × 30 ≈ 29.97.

5.8Layout and styles

This specification does not put additional constraints on the layout and renderingfeatures defined in [TTML-IMSC1.2].

Note
Layout of the paragraphs may rely on the defaultTTML region (i.e. if no<layout> element is used in the<head> element) or may be explicit by the use of theregion attribute, to refer to a<region> element present at/tt/head/layout/region.

Style references or inline stylesMAY be used, using any combination ofstyle attributes,<style> elements and inline style attributes as defined in [TTML2] or [TTML-IMSC1.2].

5.9Bidirectional text

The following metadata elements are permitted inDAPT and specified in [TTML2] as containing#PCDATA, i.e. text data only with no element content. Where bidirectional text is required within the character content within such an element, Unicode control characters can be used to define the base direction within arbitrary ranges of text.

Note

More guidance about usage of this mechanism is available atInline markup and bidirectional text in HTML.

The<p> and<span> content elements permit the direction of text to be specified using thetts:direction andtts:unicodeBidi attributes. Document authors should use this more robust mechanism rather than using Unicode control characters.

Note

The following example taken from [TTML2] demonstrates the syntax for bidirectional text markup within the<p> and<span> elements.

<p>The title of the book is"<spantts:unicodeBidi="embed"tts:direction="rtl">نشاط التدويل، W3C</span>"</p>

An example rendering of the above fragment is shown below.

Example rendition of direction example showing from left to right The title of the book is W3C and then from right to left the applicable Arabic text

6.Mapping fromTTML to theDAPT Data Model

4.DAPT Data Model and correspondingTTML syntax defines how objects and properties of theDAPT data model are represented in [TTML2], i.e. in aDAPT Document. However, aDAPT data model instance can be represented by multiple [TTML2]document instances.

For example,4.DAPT Data Model and correspondingTTML syntax does not mandate that a<div> element representing aScript Event be a direct child of the<body> element. That<div> element could be nested in another<div> element. Therefore, it is possible to serialize the objects and properties of aDAPT Script into variousDAPT Documents. This section defines how to interoperably and unambiguously reconstruct aDAPT model instance from aDAPT Document.

Note

DAPT does not define a complete serialization of theDAPT data model for extensibility reasons, to allow future versions to do so if needed. Additionally, aDAPT Document can contain elements or attributes that are not mentioned in the representations ofDAPT objects or properties. This could be because it has been generated by a processor conformant to some future version ofDAPT, or through a generic [TTML2] process, or because it uses optional features, for example to add styling or layout. This section defines how to process those elements or attributes.

Note

It is also possible to processDAPT Documents using generic [TTML2] processors, which do not necessarily map the documents to theDAPT data model. For example a generic TTML2presentation processor could render an audio mix based on aDAPT document without needing to model Script Eventsper se. In that case, this section can be ignored.

6.1Early identification of non-conformant documents

This section is non-normative.

Normative provisions relating to this section are defined in [TTML2].

Since it is a requirement ofDAPT thatDAPT Documents include attp:contentProfiles attribute on the root element, and that the attribute includes aDAPTcontent profile designator, as specified at5.6.2ttp:contentProfiles, it follows that anyTTML document that does not include such an attribute, or does not include such a profile designator, can be considered not to be aDAPT Document; therefore a processor requiring strict adherence toDAPT could stop processing such a document.

6.2Not supporting features excluded by the content profile

A processor that takes as its input aDAPT document that contains vocabulary relating to features that it does support, but where support for those features is excluded from the content profiles to which the document claims conformance,SHOULD NOT implement those features in the context of that document.

6.3Handling<div> and<p> elements

[TTML2] allows<div> elements to contain any combination of<div> elements and<p> elements. TheDAPT data model describes how eachScript Event is represented by a<div> element that contains zero or more<p> elements. It also permits other intermediate<div> elements in the path between the<body> element and thoseScript Event<div> elements. In addition, attributes not corresponding to properties in theDAPT data model are permitted.

This gives rise to possibilities such as:

The following processing rules resolve these cases.

Rules for identifyingScript Events:

  1. A<div> element that has no<div> element children and includes theTTML representations of all the mandatory properties of aScript EventMUST be mapped to aScript Event, such as having a validxml:id representing theScript Event Identifier, even if it also contains additionalunrecognised vocabulary;
  2. A<div> element that contains any<div> element childrenMUST NOT be mapped to aScript Event; the processor insteadMUST iterate through those<div> element children (recursively, in a depth-first traversal) and consider if each one meets the requirements of aScript Event;
  3. Any remaining unmapped<div> elementsMUST NOT be mapped to aScript Event.

Rules for identifyingText objects:

  1. A<p> element that is a child of a<div> element that maps to aScript EventMUST be mapped to aText object.
  2. A<p> element that is not a child of a<div> element that maps to aScript EventMUST NOT be mapped to aText object.
Note

Future versions ofDAPT could include features that use these structural possibilities differently, and therefore define other processing rules that are mutually exclusive with the rules defined here.

Example 37
This example demonstrates the above possibilities, and the application of the rules:
<body><divxml:id="d1"><!-- This is a Script Event --><p><!-- This is a Text --></p></div><div><!-- This cannot be a Script Event because it has no xml:id --><p><!-- Would be a Text if its parent were a Script Event --></p></div><divxml:id="d2_1"><!-- div parent of another div --><divxml:id="d2"><!-- Possibly a Script Event --></div></div><divxml:id="d3_1"><!-- double layer of nesting --><divxml:id="d3_1_1"><divxml:id="d3"begin="..."end="..."xml:lang="ja"foo:bar="baz"><!-- A Script Event with possibly unexpected attributes --></div></div></div><divxml:id="d4_1"><!-- mixed div and p children --><divxml:id="d4_2"><!-- This possible Script Event has a sibling <p> --></div><p><!-- Possible Text, but not if its parent is not a Script Event --></p></div></body>

The result of applying the above rules to this example is:

6.4Using computed attribute values

Some attributes have semantics for computing their value that depend on the computed value of the attribute on some other element. For example if thexml:lang attribute is not specified on an element then its computed value is the computed value of the same attribute on the element's parent.

Or, for another example, the computed times of an element in aDAPT document are relative to the begin time of the element's parent. If a<div> element specifies abegin attribute, then the computed times of its child<div> elements are relative to that parent<div> element's begin time, and so on down the hierarchy. It is important to include those "intermediate"<div> elements' times in the computation even if the processing target is an instance of theDAPT data model in which they have no direct equivalent; otherwise theScript EventBegin andEnd times would be wrong.

Considering this situation more generally, it is possible that, within aDAPT document, there can beTTML elements that do not directly correspond to objects in theDAPT data model, and those elements can specify attributes that affect the computation of attribute values that apply to elements that do correspond to objects in theDAPT data model.

The semantics defined by [TTML2] or, for vocabulary defined herein, this specification, take precedence in this scenario. ImplementationsMUST compute attribute values based on the contents of thedocument instancebefore applying those computed values toDAPT data model objects. For example a processor that supports TTML2 styling features would need to implement the TTML2 semantics for inheritance and computing relative values of attributes liketts:fontSize.

Example 38

This example demonstrates these possibilities, which a processor might need to handle:

<tt...xml:lang="en"><body><divxml:id="d1"begin="00:01:00"end="00:01:10"><!-- Script Event beginning at 1 minute, duration 10 seconds --><p><!-- This is a Text, language "en" --></p></div><divbegin="00:10:00"xml:lang="fr"><!-- div that is not a Script Event --><divxml:id="d2"begin="00:01:00"end="00:01:10"><!-- Script Event beginning at 11 minutes, duration 10 seconds --><p><!-- This is a Text, language "fr" --></p></div></div></body></tt>

Here, the<div> elements that correspond toScript Events, which have idsd1 andd2, are otherwise (apart from their identifiers) identical, yet their language andBegin properties have different computed values, due to the attributes specified on the<div> element parent ofd2.

If an implementation internally createsDAPT data model objects, such as theScript Events shown in the example above, it is important for interoperability that they use the computed values.

6.5Considerations for transformation and validation processors

This section is non-normative.

6.5.1Retaining unrecognised vocabulary

As per5.2.1Unrecognised vocabulary, implementers ofDAPT processors are encouraged to maintainunrecognised vocabulary within<metadata> elements inDAPT Documents. In practice it is possible that an implementation which both inputsDAPT documents and outputsDAPT documents might modify the input document structure and contents, and while doing so, effectively prune those entities withunrecognised vocabulary from the output document.

6.5.2Validation warnings and errors

Normative provisions relating to this section are defined in [TTML2].

[TTML2] defines avalidation processor, a class of implementation whose purpose is to assess adocument instance and decide if it is valid or not. Typically this would be used within a processing workflow to check that documents are acceptable for onward usage. A real world example would likely not return simply "good" or "bad", but also output informational, warning and error messages describing unusual, unexpected or problematic contents in the input document.

When implementing avalidation processor for aDAPT document, strict [TTML2] validation processing rules can be applied. Doing this involves checking for the presence and semantic validity of syntax and content associated with required or optional features defined in the profile, and checking for the absence of syntax associated with prohibited features.

The [TTML2] mechanism for dealing with vocabulary in unrecognised namespaces is to prune it prior to validation. This approach can be used; additionally it could be reasonable for an implementation to report as information those attributes and elements that have been pruned.

Note

The term "vocabulary" here refers to XML elements and attributes.

Validation warnings could be issued when unsupported or deprecated vocabulary in recognised namespaces is encountered after pruning, or when supported vocabulary contains unexpected but not invalid content, but in these scenarios errors are not expected.

Validation errors are expected when prohibited vocabulary is present, or when semantically invalid content within permitted vocabulary is encountered.

7.Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key wordsMAY,MUST,MUST NOT,SHOULD, andSHOULD NOT in this document are to be interpreted as described inBCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

[TTML2] specifies a formal language for expressing document and processor requirements, within theProfiling sub-system. The normative requirements of this specification are defined using the conformance terminology described above, and are also defined using this TTML2 profile mechanism. Where TTML2 vocabulary is referenced, the syntactic and semantic requirements relating to that vocabulary as defined in [TTML2] apply.

Whilst there is no requirement for aDAPT processor to implement the TTML2 profile processing semantics in general, implementers can use the TTML2 profiles defined inF.Profiles as a means of verifying that their implementations meet the normative requirements ofDAPT, for example as a checklist.

Conversely, a general purpose [TTML2] processor that does support the TTML2 profile processing semantics can use the TTML2 profiles defined inF.Profiles directly to determine if it is capable of processing aDAPT document.

7.1Conformance ofDAPT Documents

ConformantDAPT Documents are [TTML2]timed text content document instances that conform to the normative provisions of this specification. Those provisions are expressed using theprofile vocabulary of [TTML2] in thecontent profile defined inF.Profiles.

Note

DAPT Documents remain subject to the content conformance requirements specified at Section 3.1 of [TTML2]. In particular, aDAPT Document can containforeign vocabulary, since such elements and attributes are pruned by the algorithm at Section 4 of [TTML2] prior to evaluating content conformance.

7.2Conformance ofDAPT Processors

ConformantDAPT Processors are [TTML2]content processors that conform to the normative provisions of this specification. Those provisions are expressed using theprofile vocabulary of [TTML2] in theprocessor profile defined inF.Profiles.

Note
The provisions in this specification are expressed in terms of [TTML2]content processor types, using terms likecontent processor,presentation processor,validation processor, ortransformation processor. They may be understood with the mapping below:

A.Index

A.1Terms defined by this specification

A.2Terms defined by reference

B.Privacy Considerations

This section is non-normative.

With the exception of the following, the privacy considerations of [ttml2] apply:

B.1Personal Information

DAPT documents typically contain the names of characters or people who feature within the associated media, either fictional or real. In general this information would be present within the media itself or be public via other routes. If there is sensitivity associated with their being known to people with access to theDAPT documents in which their identity is contained, then such access should be managed with appropriate confidentiality. For example those documents could be available within a closed authoring environment and edited to remove the sensitive information prior to distribution to a wider audience. If this scenario arises, information security good practices within the closed environment should be applied, such as encryption of the document "at rest" and when being moved, access controlled by authentication platforms, etc.

B.2Audio format preference

This feature may contribute to browser fingerprintability.DAPT documents can reference a set of alternate external audio resources for the same fragment of audio, where the processor is expected to select one of the alternatives based on features such as format support. If this pattern is used, it is possible that the processor's choice of audio resource, being exposed to the origin, reveals information about that processor, such as its preferred audio format.

Note
Document authors and content providers can avoid referencing external audio resources by embedding the audio resources within the document. This usage pattern does not expose the processor's choice of audio resource to the origin.

C.Security Considerations

This section is non-normative.

The security considerations of [ttml2] apply.

D.Timecode-related metadata

DAPT Documents express time asmedia time, which assumes that there is a reference start time (zero) on a media timeline, and a fixed playrate. An alternative scheme that is used in some other script formats is to synchronise media components using timecode, that match time stamps that are applied, for example to each frame of video.

Workflows that createDAPT documents from such timecode-based non-DAPT script formats need to map those timecode values onto theDAPT document's timeline.

If this mapping is not correct, presentation of theDAPT Document will not be synchronised with therelated media object. A reference timecode that matches a known point on theDAPT document timeline can be used to achieve correct synchronisation, for example the timecode corresponding to the start of the programme, which should matchDAPT time zero.

In this scenario, if such a reference point is not known, but timecodes corresponding toScript Events are known, it is still possible to construct aDAPT Document, albeit one whose synchronisation with the related media has not yet been resolved.

The optionalDAPT Origin Timecode andStart of Programme Timecode properties can be used to identify when there is a possible synchronisation error, and to resynchronise the document when all the required information is known.

These properties are provided as metadata only and are not intended to be used to perform direct synchronisation offsets during presentation. In particular, when the related media object uses timecode, the presence of the timecode properties does not mean that the player needs to relate these timecode values with any timecode value embedded in the related media resource.

If either aDAPT Origin Timecode object or aStart of Programme Timecode object is present, theDAPT DocumentMUST have one<head> element child of the<tt> element, and that<head> elementMUST have at least one<metadata> element child.

D.1DAPT Origin Timecode

The optionalDAPT Origin Timecode allows a timecode value to be declared that corresponds to the zero point of theDAPT document timeline, that is, the time of a hypotheticalScript Event whoseBegin is zero seconds.

The properties can be used to provide timecodes from the related media object or from any other script format used to produce theDAPT document, and are informational. However, when they are both provided and differ, it is an indication that theDAPT document is not synchronized with the related media object and that processing of the script event begins is needed. To achieve synchronization, the following needs to be done:

  1. The differenceDAPT Origin Timecode minus theStart of Programme Timecode is computed as "delta". It may be positive or negative.
  2. EachScript Event'sBegin andEnd value X shall be changed to X + delta.
  3. TheDAPT Origin Timecode shall be removed or changed to theStart of Programme timecode value so that if this algorithm is run again it will result in delta that is zero.

ADAPT ScriptMAY contain zero or oneDAPT Origin Timecode objects.

TheDAPT Origin Timecode object is represented in aDAPT Document by a<daptm:daptOriginTimecode> element present at the path/tt/head/metadata/daptm:daptOriginTimecode, with the following constraints:

Note

See also5.7.3ttp:frameRate. No mechanism is defined here for declaring a different frame rate for theDAPT Origin Timecode than is used for other frame-based time expressions.

If the related media object contains a timecode for the video frame synchronized to the origin of theDAPT document timeline, theDAPT origin timecode is equal that timecode.

Example 39
...<head><metadata><daptm:daptOriginTimecode>10:01:20:12</daptm:daptOriginTimecode></metadata></head><body><divxml:id="se1"begin="0s"end="1.8s"><!-- This script event was generated from a source whose begin timecode was 10:01:20:12 --></div></body>...
Example 40

When converting legacy formats that store the equivalent of each Script Event, for example adescription, with SMPTE timecodes, but where theStart of Programme Timecode is not stored, and is not immediately available, theDAPT Origin Timecode can be used to defer synchronisation with the media until that media begin timecode is known.

One approach to deferring synchronisation usingDAPT Origin Timecode is as follows, assuming that a legacy source file is being converted to aDAPT Document and relates to media with continuous timecode:

When the related media's timecode is later dicovered, the times of theScript Events can be adjusted so that media time zero is coincident with theStart of Programme Timecode, and theDAPT Origin Timecode can be removed or set to the now-known value.

Note

The above algorithm can be used to align the begin times of theDAPT Script and therelated media object. It does not address progressive synchronisation errors caused by incorrect frame rates or differing playback rates.

D.2Start of Programme Timecode

The optionalStart of Programme Timecode allows a timecode value to be declared that corresponds to the beginning of the related media object's programme content.

In combination withDAPT Origin Timecode, the value ofStart of Programme Timecode can be used to infer whether or not the media times in theDAPT Script are likely to be correctly synchronised with theRelated Media Object.

If bothDAPT Origin Timecode andStart of Programme Timecode are present, but their values are different, it is likely that the media times are not synchronised with theRelated Media Object, since this implies that the equivalent time code to zero seconds in media time is not the start of the programme, which is the requirement for correctly synchronised media time.

ADAPT ScriptMAY contain zero or oneStart of Programme Timecode objects.

TheStart of Programme Timecode object is represented in aDAPT Document by a<ebuttm:documentStartOfProgramme> element present at the path/tt/head/metadata/ebuttm:documentStartOfProgramme, with the syntactic constraints as defined in [EBU-TT-3390].

Example 41
...<head><metadata><daptm:daptOriginTimecode>10:01:20:12</daptm:daptOriginTimecode><ebuttm:documentStartOfProgramme>10:00:00:00<ebuttm:documentStartOfProgramme><!-- It is likely that this document is 1 minute, 20 seconds and 12 frames too early,  compared to the related media --></metadata></head><body><divxml:id="se1"begin="0s"end="1.8s"><!-- This script event was generated from a source whose begin timecode was 10:01:20:12 --></div></body>...

If the times of theScript Events are adjusted to bring the media time into synchronisation with theRelated Media Object, as noted inDAPT Origin Timecode, theStart of Programme TimecodeSHOULD NOT be changed, since it is an invariant feature of theRelated Media Object, and does not describe the times in theDAPT Document.

E.Audio Mixing

This section is non-normative.

Applying theMixing Instructions can be implemented using [webaudio].Figure2 shows the flow of programme audio, and how, when audio-generating elements are active, the pan and gain (if set) on theScript Event are applied, then the output is passed to theText, which mixes in the audio from any activeAudio Recording, itself subject to its ownMixing Instructions, then the result has theText'sMixing Instructions applied, prior to the output being mixed on to the master bus.

(active) ScriptEventPanGain(active) AudioRecordingPanGain(active) TextPanGainprogrammeaudioWhen audiomixing is activeWhen no audiomixing is active
Figure2Example simple audio routing between objects

This example is shown as [webaudio] nodes inFigure3.

GainNode (ScriptEvent)PanNode (ScriptEvent)GainNode (Text)PanNode (Text)GainNode (AudioRecording)PanNode (AudioRecording)Implicit mixerMaster busProgramme AudioAudioRecordingSource AudioOutput Audio
Figure3Web audio nodes representing the audio processing needed.

The above examples are simplified in at least two ways:

F.Profiles

This section defines a [TTML2]content profile and aprocessor profile by expressing dispositions against a set offeatures andextensions. TheDAPT extensions are defined inG.Extensions.

The Profile Semantics specified in [TTML2] apply.

ATTMLProfile specification is a document that lists all the features ofTTML that are required / optional / prohibited within “document instances” (files) and “processors” (things that process the files), and any extensions or constraints.

Atimed text content document instance that conforms to thecontent profile defined herein:

Note

Atimed text content document instance, by definition, satisfies the requirements of Section 3.1 at [TTML2], and hence atimed text content document instance that conforms to a profile defined herein is also a conforming TTML2Document Instance.

APresentation processor that conforms to theprocessor profile defined in this specification:

ATransformation processor that conforms to theprocessor profile defined in this specification:

The dispositions required, permitted, optional and prohibited as used in this specification map to the [TTML2]<ttp:feature> and<ttp:extension> elements'value attribute values as follows:

DAPT disposition<ttp:feature> or<ttp:extension> elementvalue attribute value in
content profileprocessor profile
requiredrequiredrequired
permittedoptionalrequired
optionaloptionaloptional
prohibitedprohibitedoptional
Note

The use of the termspresentation processor andtransformation processor within this document does not imply conformanceper se to any of the Standard Profiles defined in [TTML2]. In other words, it is not considered an error for apresentation processor ortransformation processor to conform to the profile defined in this document without also conforming to the TTML2 Presentation Profile or the TTML2 Transformation Profile.

Note

The use of the [TTML2]profiling sub-system to describeDAPT conformance within this specification is not intended imply thatDAPT processors are required to support any features of that system other than those for which support is explicitly required byDAPT.

Note

This document does not specifypresentation processor ortransformation processor behavior when processing or transforming a non-conformanttimed text content document instance.

Note

The permitted and prohibited dispositions do not refer to the specification of a<ttp:feature> or<ttp:extension> element as being permitted or prohibited within a<ttp:profile> element.

F.1Disposition of Features and Extensions

Thefeatures andextensions listed in this section express the minimal requirements forDAPT Documents,Presentation Processors, andTransformation Processors.DAPT DocumentsMAY additionally conform to other profiles, and include syntax not prohibited by theDAPTcontent profile.Presentation Processors andTransformation ProcessorsMAY support additional syntax and semantics relating to other profiles.

Note

For example, aDAPT Script can include syntax permitted by the IMSC ([TTML-IMSC1.2]) profiles of [TTML2] to enhance the presentation ofscripts to actors recording audio, or to add styling important for later usage in subtitle or caption creation.

Editor's note

Editorial task: go through this list offeatures and check the disposition of each. There should be no prohibited features that are permitted in IMSC.

Feature or ExtensionDispositionAdditional provision
Relative to the TT Feature namespace
#animate-fillpermitted
#animate-minimalpermitted
#animation-out-of-lineprohibitedSee4.10Mixing Instruction.
#audiopermitted
#audio-descriptionpermitted
#audio-speechpermitted
#bidipermitted
#bidi-version-2permitted
#chunkpermitted
#clockModeprohibited
#clockMode-gpsprohibited
#clockMode-localprohibited
#clockMode-utcprohibited
#contentpermitted
#contentProfilespermittedSee5.6.2ttp:contentProfiles andG.3#contentProfiles-root.
#contentProfiles-combinedoptionalSee5.6.5Other TTML2 Profile Vocabulary.
#corepermitted
#datapermitted
#directionpermitted
#dropModeprohibited
#dropMode-dropNTSCprohibited
#dropMode-dropPALprohibited
#dropMode-nonDropprohibited
#embedded-audiopermitted
#embedded-datapermitted
#frameRatepermitted See5.7.3ttp:frameRate.
#frameRateMultiplierpermitted
#gainpermitted
#markerModeprohibited
#markerMode-continuousprohibited
#markerMode-discontinuousprohibited
#metadatapermitted
#metadata-itempermitted
#metadata-version-2permitted
#panpermitted
#permitFeatureNarrowingoptionalSee5.6.5Other TTML2 Profile Vocabulary.
#permitFeatureWideningoptionalSee5.6.5Other TTML2 Profile Vocabulary.
#pitchpermitted
#presentation-audiopermitted
#processorProfilesoptionalSee5.6.4ttp:processorProfiles.
#processorProfiles-combinedoptionalSee5.6.5Other TTML2 Profile Vocabulary.
#profilepartially permitted See5.6.3ttp:profile.
#profile-full-version-2partially permitted See5.6.5Other TTML2 Profile Vocabulary.
#profile-version-2partially permitted See5.6.5Other TTML2 Profile Vocabulary.
#resourcespermitted
#setpermitted
#set-fillpermitted
#set-multiple-stylespermitted
#sourcepermitted
#speakpermitted
#speechpermitted
#structurerequired
#stylingpermitted
#styling-chainedpermitted
#styling-inheritance-contentpermitted
#styling-inlinepermitted
#styling-referentialpermitted
#subFrameRateprohibited
#tickRatepermittedSee5.7.4ttp:tickRate.
#time-clockpermitted
#time-clock-with-framesprohibited
#time-offset-with-framespermittedSee5.7.3ttp:frameRate.
#time-offset-with-tickspermittedSee5.7.4ttp:tickRate.
#time-offsetpermitted
#time-wall-clockprohibited
#timeBase-clockprohibited
#timeBase-mediarequired

See5.7.1ttp:timeBase.

NOTE: [TTML1] specifies that the default timebase is"media" if thettp:timeBase attribute is not specified on the<tt> element.

#timeBase-smpteprohibited
#timeContainerprohibitedSee5.7.2timeContainer.
#timingpermittedSee5.7.5Time expressions.
#transformationpermitted See constraints at#profile.
#unicodeBidipermitted
#unicodeBidi-isolatepermitted
#unicodeBidi-version-2permitted
#xlinkpermitted
Relative to theDAPT Extension namespace
#agentpermitted This is the profile expression of4.2Character.
#contentProfiles-rootrequired This is the profile expression of5.6.2ttp:contentProfiles.
#daptOriginTimecodepermitted This is the profile expression ofD.1DAPT Origin Timecode.
#descTypepermitted This is the profile expression ofdaptm:descType.
#onScreenpermitted This is the profile expression of4.6On Screen.
#profile-rootprohibited This is the profile expression of the prohibition of thettp:profile attribute on the<tt> element as specified in5.6.3ttp:profile.
#representsrequired This is the profile expression ofRepresents as applied toScript Event.
#scriptEventGroupingpermitted This is the profile expression of the permission to nest<div> elements described in4.3Script Event.
#scriptEventMappingoptional This is the profile expression of6.3Handling<div> and<p> elements.
#scriptRepresentsrequired This is the profile expression ofScript Represents.
#scriptType-rootrequired This is the profile expression of4.1.3Script Type.
#serializationrequired This is the profile expression of5.1Document Encoding.
#source-dataprohibited This is the profile expression of the prohibition of<source> child elements of<data> elements as specified in4.9.1Audio Recording.
#textLanguageSourcepermitted This is the profile expression of4.5Text Language Source as required at4.4Text.
#xmlId-divrequired This is the profile expression of4.3Script Event.
#xmlLang-audio-nonMatchingprohibited This is the profile expression of the prohibition of thexml:lang attribute on the<audio> element having a different computed value to the parent element and descendant or referenced<source> and<data> elements, as specified in4.9.1Audio Recording.
#xmlLang-rootrequired This is the profile expression of4.1.2Default Language.

F.2DAPT Content Profile

TheDAPT Content Profile expresses the conformance requirements ofDAPT Scripts using the profile mechanism of [TTML2]. It can be used by a validating processor that supports theDAPT Processor Profile to validate aDAPT Document.

There is no requirement to include theDAPT Content Profile within aDAPT Document.

<?xml version="1.0" encoding="utf-8"?><!-- this file defines the "dapt-content" profile of ttml --><profilexmlns="http://www.w3.org/ns/ttml#parameter"designator="http://www.w3.org/ns/ttml/profile/dapt1.0/content"combine="mostRestrictive"type="content"><featuresxml:base="http://www.w3.org/ns/ttml/feature/"><!-- required (mandatory) feature support --><featurevalue="required">#structure</feature><featurevalue="required">#timeBase-media</feature><!-- optional (voluntary) feature support --><featurevalue="optional">#animate-fill</feature><featurevalue="optional">#animate-minimal</feature><featurevalue="optional">#audio</feature><featurevalue="optional">#audio-description</feature><featurevalue="optional">#audio-speech</feature><featurevalue="optional">#bidi</feature><featurevalue="optional"extends="#bidi">#bidi-version-2</feature><featurevalue="optional">#chunk</feature><featurevalue="optional">#content</feature><featurevalue="optional">#contentProfiles</feature><featurevalue="optional">#contentProfiles-combined</feature><featurevalue="optional">#core</feature><featurevalue="optional">#data</feature><featurevalue="optional">#direction</feature><featurevalue="optional">#embedded-audio</feature><featurevalue="optional">#embedded-data</feature><featurevalue="optional">#frameRate</feature><featurevalue="optional">#frameRateMultiplier</feature><featurevalue="optional">#gain</feature><featurevalue="optional">#metadata</feature><featurevalue="optional">#metadata-item</feature><featurevalue="optional"extends="#metadata">#metadata-version-2</feature><featurevalue="optional">#pan</feature><featurevalue="optional">#permitFeatureNarrowing</feature><featurevalue="optional">#permitFeatureWidening</feature><featurevalue="optional">#pitch</feature><featurevalue="optional">#presentation-audio</feature><featurevalue="optional">#processorProfiles</feature><featurevalue="optional">#processorProfiles-combined</feature><featurevalue="optional">#resources</feature><featurevalue="optional"extends="#animation">#set</feature><featurevalue="optional">#set-fill</feature><featurevalue="optional">#set-multiple-styles</feature><featurevalue="optional">#source</feature><featurevalue="optional">#speak</feature><featurevalue="optional">#speech</feature><featurevalue="optional">#styling</feature><featurevalue="optional">#styling-chained</feature><featurevalue="optional">#styling-inheritance-content</feature><featurevalue="optional">#styling-inline</feature><featurevalue="optional">#styling-referential</feature><featurevalue="optional">#tickRate</feature><featurevalue="optional">#time-clock</feature><featurevalue="optional">#time-offset</feature><featurevalue="optional">#time-offset-with-frames</feature><featurevalue="optional">#time-offset-with-ticks</feature><featurevalue="optional">#timing</feature><featurevalue="optional">#transformation</feature><featurevalue="optional">#unicodeBidi</feature><featurevalue="optional">#unicodeBidi-isolate</feature><featurevalue="optional"extends="#unicodeBidi">#unicodeBidi-version-2</feature><featurevalue="optional">#xlink</feature><!-- prohibited feature support --><featurevalue="prohibited">#animation-out-of-line</feature><featurevalue="prohibited">#clockMode</feature><featurevalue="prohibited">#clockMode-gps</feature><featurevalue="prohibited">#clockMode-local</feature><featurevalue="prohibited">#clockMode-utc</feature><featurevalue="prohibited">#dropMode</feature><featurevalue="prohibited">#dropMode-dropNTSC</feature><featurevalue="prohibited">#dropMode-dropPAL</feature><featurevalue="prohibited">#dropMode-nonDrop</feature><featurevalue="prohibited">#markerMode</feature><featurevalue="prohibited">#markerMode-continuous</feature><featurevalue="prohibited">#markerMode-discontinuous</feature><featurevalue="prohibited">#subFrameRate</feature><featurevalue="prohibited">#time-clock-with-frames</feature><featurevalue="prohibited">#time-wall-clock</feature><featurevalue="prohibited">#timeBase-clock</feature><featurevalue="prohibited">#timeBase-smpte</feature><featurevalue="prohibited">#timeContainer</feature></features><extensionsxml:base="http://www.w3.org/ns/ttml/profile/dapt/extension/"><!-- required (mandatory) extension support --><extensionvalue="required">#contentProfiles-root</extension><extensionvalue="required">#represents</extension><extensionvalue="required">#scriptRepresents</extension><extensionvalue="required">#scriptType-root</extension><extensionvalue="required">#serialization</extension><extensionvalue="required">#xmlId-div</extension><extensionvalue="required">#xmlLang-root</extension><!-- optional (voluntary) extension support --><extensionvalue="optional">#agent</extension><extensionvalue="optional">#daptOriginTimecode</extension><extensionvalue="optional">#descType</extension><extensionvalue="optional">#onScreen</extension><extensionvalue="optional">#scriptEventGrouping</extension><extensionvalue="optional">#scriptEventMapping</extension><extensionvalue="optional">#textLanguageSource</extension><!-- prohibited extension support --><extensionvalue="prohibited">#profile-root</extension><extensionvalue="prohibited">#source-data</extension><extensionvalue="prohibited">#xmlLang-audio-nonMatching</extension></extensions></profile>

F.3DAPT Processor Profile

TheDAPT Processor Profile expresses the processing requirements ofDAPT Scripts using the profile mechanism of [TTML2]. A processor that supports the required features and extensions of theDAPT Processor Profile can, minimally, process all permitted features within aDAPT Document.

There is no requirement to include theDAPT Processor Profile within aDAPT Document.

<?xml version="1.0" encoding="utf-8"?><!-- this file defines the "dapt-processor" profile of ttml --><profilexmlns="http://www.w3.org/ns/ttml#parameter"designator="http://www.w3.org/ns/ttml/profile/dapt1.0/processor"combine="mostRestrictive"type="processor"><featuresxml:base="http://www.w3.org/ns/ttml/feature/"><!-- required (mandatory) feature support --><featurevalue="required">#animate-fill</feature><featurevalue="required">#animate-minimal</feature><featurevalue="required">#audio</feature><featurevalue="required">#audio-description</feature><featurevalue="required">#audio-speech</feature><featurevalue="required">#bidi</feature><featurevalue="required"extends="#bidi">#bidi-version-2</feature><featurevalue="required">#chunk</feature><featurevalue="required">#content</feature><featurevalue="required">#contentProfiles</feature><featurevalue="required">#core</feature><featurevalue="required">#data</feature><featurevalue="required">#direction</feature><featurevalue="required">#embedded-audio</feature><featurevalue="required">#embedded-data</feature><featurevalue="required">#frameRate</feature><featurevalue="required">#frameRateMultiplier</feature><featurevalue="required">#gain</feature><featurevalue="required">#metadata</feature><featurevalue="required">#metadata-item</feature><featurevalue="required"extends="#metadata">#metadata-version-2</feature><featurevalue="required">#pan</feature><featurevalue="required">#pitch</feature><featurevalue="required">#presentation-audio</feature><featurevalue="required">#resources</feature><featurevalue="required"extends="#animation">#set</feature><featurevalue="required">#set-fill</feature><featurevalue="required">#set-multiple-styles</feature><featurevalue="required">#source</feature><featurevalue="required">#speak</feature><featurevalue="required">#speech</feature><featurevalue="required">#structure</feature><featurevalue="required">#styling</feature><featurevalue="required">#styling-chained</feature><featurevalue="required">#styling-inheritance-content</feature><featurevalue="required">#styling-inline</feature><featurevalue="required">#styling-referential</feature><featurevalue="required">#tickRate</feature><featurevalue="required">#time-clock</feature><featurevalue="required">#time-offset</feature><featurevalue="required">#time-offset-with-frames</feature><featurevalue="required">#time-offset-with-ticks</feature><featurevalue="required">#timeBase-media</feature><featurevalue="required">#timing</feature><featurevalue="required">#transformation</feature><featurevalue="required">#unicodeBidi</feature><featurevalue="required">#unicodeBidi-isolate</feature><featurevalue="required"extends="#unicodeBidi">#unicodeBidi-version-2</feature><featurevalue="required">#xlink</feature><!-- optional (voluntary) feature support --><featurevalue="optional">#animation-out-of-line</feature><featurevalue="optional">#clockMode</feature><featurevalue="optional">#clockMode-gps</feature><featurevalue="optional">#clockMode-local</feature><featurevalue="optional">#clockMode-utc</feature><featurevalue="optional">#contentProfiles-combined</feature><featurevalue="optional">#dropMode</feature><featurevalue="optional">#dropMode-dropNTSC</feature><featurevalue="optional">#dropMode-dropPAL</feature><featurevalue="optional">#dropMode-nonDrop</feature><featurevalue="optional">#markerMode</feature><featurevalue="optional">#markerMode-continuous</feature><featurevalue="optional">#markerMode-discontinuous</feature><featurevalue="optional">#permitFeatureNarrowing</feature><featurevalue="optional">#permitFeatureWidening</feature><featurevalue="optional">#processorProfiles</feature><featurevalue="optional">#processorProfiles-combined</feature><featurevalue="optional">#subFrameRate</feature><featurevalue="optional">#time-clock-with-frames</feature><featurevalue="optional">#time-wall-clock</feature><featurevalue="optional">#timeBase-clock</feature><featurevalue="optional">#timeBase-smpte</feature><featurevalue="optional">#timeContainer</feature></features><extensionsxml:base="http://www.w3.org/ns/ttml/profile/dapt/extension/"><!-- required (mandatory) extension support --><extensionvalue="required">#agent</extension><extensionvalue="required">#contentProfiles-root</extension><extensionvalue="required">#daptOriginTimecode</extension><extensionvalue="required">#descType</extension><extensionvalue="required">#onScreen</extension><extensionvalue="required">#represents</extension><extensionvalue="required">#scriptEventGrouping</extension><extensionvalue="required">#scriptRepresents</extension><extensionvalue="required">#scriptType-root</extension><extensionvalue="required">#serialization</extension><extensionvalue="required">#textLanguageSource</extension><extensionvalue="required">#xmlId-div</extension><extensionvalue="required">#xmlLang-root</extension><!-- optional (voluntary) extension support --><extensionvalue="optional">#profile-root</extension><extensionvalue="optional">#scriptEventMapping</extension><extensionvalue="optional">#source-data</extension><extensionvalue="optional">#xmlLang-audio-nonMatching</extension></extensions></profile>

G.Extensions

G.1General

The following sections defineextension designations, expressed as relative URIs (fragment identifiers) relative to theDAPT Extension Namespace base URI. These extension designations are used inF.Profiles to describe the normative provisions ofDAPT that are not expressed by [TTML2] profile features.

G.2#agent

Atransformation processor supports the#agent extension if it recognizes and is capable of transforming values of the following elements and attributes on the<ttm:agent> element:

and if it recognizes and is capable of transforming each of the following value combinations:

Apresentation processor supports the#agent extension if it implements presentation semantic support of the above listed elements, attributes and value combinations.

G.3#contentProfiles-root

Atransformation processor supports the#contentProfiles-root extension if it recognizes and is capable of transforming values of thettp:contentProfiles attribute on the<tt> element.

Apresentation processor supports the#contentProfiles-root extension if it implements presentation semantic support of thettp:contentProfiles attribute on the<tt> element.

Note
This extension is defined as a subset of#contentProfiles to avoid requiring processor support for thettp:profile attribute and processor profile inference semantics.

G.4#daptOriginTimecode

Atransformation processor supports the#daptOriginTimecode extension if it recognizes and is capable of transforming values of the<daptm:daptOriginTimecode> element.

Nopresentation processor behaviour is defined for the#daptOriginTimecode extension.

G.5#descType

Atransformation processor supports the#descType extension if it recognizes and is capable of transforming values of thedaptm:descType attribute on the<ttm:desc> element.

Apresentation processor supports the#descType extension if it implements presentation semantic support of thedaptm:descType attribute on the<ttm:desc> element.

G.6#onScreen

Atransformation processor supports the#onScreen extension if it recognizes and is capable of transforming values of thedaptm:onScreen attribute on the<div> element.

Apresentation processor supports the#onScreen extension if it implements presentation semantic support of thedaptm:onScreen attribute on the<div> element.

G.7#profile-root

Atransformation processor supports the#profile-root extension if it recognizes and is capable of transforming values of thettp:profile attribute on the<tt> element.

Apresentation processor supports the#profile-root extension if it implements presentation semantic support of thettp:profile attribute on the<tt> element.

G.8#represents

Atransformation processor supports the#represents extension if it recognizes and is capable of transforming values of thedaptm:represents attribute.

Apresentation processor supports the#represents extension if it implements presentation semantic support of thedaptm:represents attribute.

An example of atransformation processor that supports this extension is a validating processor that reports an error if the extension is permitted by acontent profile but thetimed text content document instance claiming conformance to that profile has a<div> element with adaptm:represents attribute whose value is not conformant with the requirements defined herein.

G.9#scriptEventGrouping

Atransformation processor supports the#scriptEventGrouping extension if it recognises and is capable of transforming<div> elements that contain<div> elements.

Note

Support for the#scriptEventGrouping extension does not imply support for the#scriptEventMapping extension.

Apresentation processor supports the#scriptEventGrouping extension if it implements presentation semantic support for<div> elements that contain<div> elements.

G.10#scriptEventMapping

Atransformation processor supports the#scriptEventMapping extension if, when mapping aDAPT document into an internal representation of theDAPT data model, it implements the processing requirements specified at6.3Handling<div> and<p> elements.

Note

No support for the#scriptEventMapping extension is required forpresentation processors because there are no presentation semantics that either require, or depend upon, mapping aDAPT document into an internal representation of theDAPT data model. A presentation processor that does perform such a mapping can also be considered to be atransformation processor for the purpose of this extension.

G.11#scriptRepresents

Atransformation processor supports the#scriptRepresents extension if it recognizes and is capable of transforming values of thedaptm:scriptRepresents attribute on the<tt> element.

Apresentation processor supports the#scriptRepresents extension if it implements presentation semantic support of thedaptm:scriptRepresents attribute on the<tt> element.

An example of atransformation processor that supports this extension is a validating processor that reports an error if the extension is required by acontent profile but thetimed text content document instance claiming conformance to that profile either does not have adaptm:scriptRepresents attribute on the<tt> element or has one whose value is not conformant with the requirements defined herein.

G.12#scriptType-root

Atransformation processor supports the#scriptType-root extension if it recognizes and is capable of transforming values of thedaptm:scriptType attribute on the<tt> element.

Apresentation processor supports the#scriptType-root extension if it implements presentation semantic support of thedaptm:scriptType attribute on the<tt> element.

An example of atransformation processor that supports this extension is a validating processor that provides appropriate feedback, for example warnings, when theSHOULD requirements defined in4.1.3Script Type for aDAPT Document'sdaptm:scriptType are not met, and that reports an error if the extension is required by acontent profile but thetimed text content document instance claiming conformance to that profile either does not have adaptm:scriptType attribute on the<tt> element or has one whose value is not defined herein.

G.13#serialization

A serialized document that is valid with respect to the#serialization extension is an XML 1.0 [xml] document encoded using UTF-8 character encoding as specified in [UNICODE], that contains no entity declarations and no entity references other than to predefined entities.

Atransformation processor or apresentation processor supports the#serialization extension if it can read a serialized document as defined above.

Atransformation processor that writes documents supports the#serialization extension if it can write a serialized document as defined above.

G.14#source-data

Atransformation processor supports the#source-data extension if it recognizes and is capable of transforming values of the<source> element child of a<data> element.

Apresentation processor supports the#source-data extension if it implements presentation semantic support of the<source> element child of a<data> element.

G.15#textLanguageSource

Atransformation processor supports the#textLanguageSource extension if it recognizes and is capable of transforming values of thedaptm:langSrc attribute.

Apresentation processor supports the#textLanguageSource extension if it implements presentation semantic support of thedaptm:langSrc attribute.

G.16#xmlId-div

Atransformation processor supports the#xmlId-div extension if it recognizes and is capable of transforming values of thexml:id attribute on the<div> element.

Apresentation processor supports the#xmlId-div extension if it implements presentation semantic support of thexml:id attribute on the<div> element.

G.17#xmlLang-audio-nonMatching

Atransformation processor supports the#xmlLang-audio-nonMatching extension if it recognizes and is capable of transforming values of thexml:lang attribute on the<audio> element that differ from the computed value of the same attribute of its parent element or any of its descendant or referenced<source> or<data> elements, known asnon-matching values.

Apresentation processor supports the#xmlLang-audio-nonMatching extension if it implements presentation semantic support of suchnon-matchingxml:lang attribute values.

G.18#xmlLang-root

Atransformation processor supports the#xmlLang-root extension if it recognizes and is capable of transforming values of thexml:lang attribute on the<tt> element and the additional semantics specified in4.1.2Default Language.

Apresentation processor supports the#xmlLang-root extension if it implements presentation semantic support of thexml:lang attribute on the<tt> element and the additional semantics specified in4.1.2Default Language.

H.Registry Section

H.1Registry Definition

This section specifies theregistry definition, consisting of the custodianship, change process and the core requirements of theregistry tables defined in this document.

H.1.1Custodianship

Thecustodian of thisw3c registry is theTimed Text Working Group (TTWG). If theTTWG is unable to fulfil the role ofcustodian, for example if it has beenclosed, thecustodian in lieu is theW3C Team.

H.1.2Change Process

H.1.2.1Requesting a change

Changes to thisW3C RegistryMUST be requested (thechange request) using any one of the following options:

Thechange requestMUST include enough information for thecustodian to be able to identify all of:

The proposer of the changeMAY open a pull request (or equivalent) on theversion control system, with that pull request containing the proposed changes. If a pull request is opened then a corresponding issueMUST also be opened and the pull requestMUST be linked to that issue.

Note
A pull request representing the proposals made in thechange request is required beforeTTWG review can proceed.
H.1.2.2Change request assessment process

The process for assessing achange request depends on thecustodian.

H.1.2.2.1Custodian isTTWG

If thecustodian is theTTWG:

An approvedchange request is enacted by merging its related pull request into theversion control system and publishing the updated version of this document.

H.1.2.2.2Custodian is theW3C Team

If thecustodian is theW3C Team, the TeamMUST seekwide review of thechange request and offer a review period of at least 4 weeks, before assessing from the responses received if there is consensus amongst the respondents.

The TeamMAY require a pull request on theversion control system to be opened as the basis of the review.

If there is such consensus, the TeamMUST make the proposed changes.

H.1.3Registry Table Constraints

This section defines constraints on theregistry tables defined in this document. Eachregistry table consists of a set ofregistry entries. Eachregistry table has an associatedregistry table definition inH.2Registry Table Definitions, which lists thefields present in eachregistry entry.

H.1.3.1Registry Entries

Eachregistry entry has astatus, a uniquekey, and if appropriate, otherfields, for example any notes, a description, or a reference to some other defining entity.

Theregistry table definitionMUST define thefields and thekey to be used in eachregistry entry.

Note
It is possible to use anyfield or combination offields as thekey. For example afield called "4 Character Code" might be used as thekey. Another example is two fields called "Short name" and "Revision number" which, in combination, might be used as thekey.
H.1.3.1.1Status

Theregistry entrystatus field reflects the maturity of that entry. Permitted values are:

ProvisionalFinalDeprecated

No other values are permitted.

H.1.3.1.1.1Provisional

Registry entries with astatus ofProvisionalMAY be changed or deleted. Theirstatus may be changed toFinal orDeprecated.

Registry entrykeys inProvisional entries that were later deletedMAY be reused.

Newly createdregistry entriesSHOULD havestatusProvisional.

H.1.3.1.1.2Final

Registry entries with astatus ofFinalMUST NOT be deleted or changed. TheirstatusMAY be changed toDeprecated.

Registry entrykeys inFinal entriesMUST NOT be reused.

Newly createdregistry entriesMAY havestatusFinal.

H.1.3.1.1.3Deprecated

Registry entries with astatus ofDeprecatedMUST NOT be deleted or changed. TheirstatusMAY be changed toFinal unless that would result in a duplicatekey within the set of entries whosestatus is eitherProvisional orFinal.

Registry entrykeys inDeprecated entries that were previouslyProvisional and neverFinalMAY be reused.

Registry entrykeys inDeprecated entries that were previouslyFinalMUST NOT be reused.

Newly createdregistry entriesMUST NOT havestatusDeprecated.

H.2Registry Table Definitions

This section definesregistry tables and locates theirregistry entries.

H.2.1daptm:descType registry table definition

Theregistry table fordaptm:descType defines a set of values that can be used in thedaptm:descType attribute.

Thekey is the "daptm:descType" field. The "description" field describes the intended purpose of each value.

Theregistry entries for thisregistry table are located in4.8Script Event Description.

Thekey values for thisregistry tableMUST NOT begin withx- which is reserved for user extensions.

H.2.2<content-descriptor> registry table definition

Theregistry table for<content-descriptor> defines a set of values that can be used in thedaptm:represents attribute.

Thekey is the "<content-descriptor>" field. The "Description" field describes the type of media content represented by each value. The "Example usage" field describes the type of script in which the content type described are commonly found.

Theregistry entries for thisregistry table are located in4.1.6.2<content-descriptor> values.

Thekey values for thisregistry tableMUST NOT begin withx- which is reserved for user extensions.

I.Acknowledgments

The Editors acknowledge the current and former members of the Timed Text Working Group, the members of otherW3C Working Groups, the members of the Audio Description Community Group, and industry experts in other forums who have contributed directly or indirectly to the process or content of this document.

The Editors wish to especially acknowledge the following contributions by members: Glenn Adams, Skynav; Pierre-Anthony Lemieux, MovieLabs; Hewson Maxwell, Ericsson; Chris Needham, British Broadcasting Corporation; Atsushi Shimono,W3C; Matt Simpson, Invited Expert; Andreas Tai, Invited Expert.

J.References

J.1Normative references

[BCP47]
Tags for Identifying Languages. A. Phillips, Ed.; M. Davis, Ed. IETF. September 2009. Best Current Practice. URL:https://www.rfc-editor.org/rfc/rfc5646
[EBU-R37]
EBU Recommendation R37-2007. The relative timing of the sound and vision components of a television signal. EBU/UER. February 2007. URL:https://tech.ebu.ch/publications/r037
[EBU-TT-3390]
EBU-TT Part M, Metadata Definitions. EBU/UER. May 2017. URL:https://tech.ebu.ch/publications/tech3390
[MIME-TYPES]
Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. N. Freed; N. Borenstein. IETF. November 1996. Draft Standard. URL:https://www.rfc-editor.org/rfc/rfc2046
[namespaceState]
The Disposition of Names in an XML Namespace. Norman Walsh. W3C. 29 March 2006. W3C Working Draft. URL:https://www.w3.org/TR/namespaceState/
[RFC2119]
Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL:https://www.rfc-editor.org/rfc/rfc2119
[RFC8174]
Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words. B. Leiba. IETF. May 2017. Best Current Practice. URL:https://www.rfc-editor.org/rfc/rfc8174
[TTML-IMSC1.2]
TTML Profiles for Internet Media Subtitles and Captions 1.2. Pierre-Anthony Lemieux. W3C. 4 August 2020. W3C Recommendation. URL:https://www.w3.org/TR/ttml-imsc1.2/
[TTML1]
Timed Text Markup Language 1 (TTML1) (Third Edition). Glenn Adams; Pierre-Anthony Lemieux. W3C. 8 November 2018. W3C Recommendation. URL:https://www.w3.org/TR/ttml1/
[TTML2]
Timed Text Markup Language 2 (TTML2) (2nd Edition). W3C. 2021-03-09. URL:https://www.w3.org/TR/2021/CR-ttml2-20210309
[UNICODE]
The Unicode Standard. Unicode Consortium. URL:https://www.unicode.org/versions/latest/
[w3c-process]
W3C Process Document. Elika J. Etemad (fantasai); Florian Rivoal. W3C. 3 November 2023. URL:https://www.w3.org/policies/process/
[XML]
Extensible Markup Language (XML) 1.0 (Fifth Edition). Tim Bray; Jean Paoli; Michael Sperberg-McQueen; Eve Maler; François Yergeau et al. W3C. 26 November 2008. W3C Recommendation. URL:https://www.w3.org/TR/xml/
[xml-names]
Namespaces in XML 1.0 (Third Edition). Tim Bray; Dave Hollander; Andrew Layman; Richard Tobin; Henry Thompson et al. W3C. 8 December 2009. W3C Recommendation. URL:https://www.w3.org/TR/xml-names/
[xmlschema-2]
XML Schema Part 2: Datatypes Second Edition. Paul V. Biron; Ashok Malhotra. W3C. 28 October 2004. W3C Recommendation. URL:https://www.w3.org/TR/xmlschema-2/
[XPath]
XML Path Language (XPath) Version 1.0. James Clark; Steven DeRose. W3C. 16 November 1999. W3C Recommendation. URL:https://www.w3.org/TR/xpath-10/

J.2Informative references

[BBC-WHP051]
BBC R&D White Paper WHP 051. Audio Description: what it is and how it works. N.E. Tanton, T. Ware and M. Armstrong. October 2002 (revised July 2004). URL:https://www.bbc.co.uk/rd/publications/whitepaper051
[DAPT-REQS]
DAPT Requirements. Cyril Concolato; Nigel Megitt. W3C. 12 October 2022. DNOTE. URL:https://www.w3.org/TR/dapt-reqs/
[I18N-INLINE-BIDI]
Inline markup and bidirectional text in HTML. W3C. 2021-06-25. URL:https://www.w3.org/International/articles/inline-bidi-markup
[media-accessibility-reqs]
Media Accessibility User Requirements. Shane McCarron; Michael Cooper; Mark Sadecki. W3C. 3 December 2015. W3C Working Group Note. URL:https://www.w3.org/TR/media-accessibility-reqs/
[SSML]
Speech Synthesis Markup Language (SSML) Version 1.1. Daniel Burnett; Zhi Wei Shuang. W3C. 7 September 2010. W3C Recommendation. URL:https://www.w3.org/TR/speech-synthesis11/
[uml]
OMG Unified Modeling Language. Open Management Group. OMG. 1 March 2015. Normative. URL:http://www.omg.org/spec/UML/
[WCAG22]
Web Content Accessibility Guidelines (WCAG) 2.2. Michael Cooper; Andrew Kirkpatrick; Alastair Campbell; Rachael Bradley Montgomery; Charles Adams. W3C. 12 December 2024. W3C Recommendation. URL:https://www.w3.org/TR/WCAG22/
[webaudio]
Web Audio API. Paul Adenot; Hongchan Choi. W3C. 17 June 2021. W3C Recommendation. URL:https://www.w3.org/TR/webaudio-1.0/
[xml-id]
xml:id Version 1.0. Jonathan Marsh; Daniel Veillard; Norman Walsh. W3C. 9 September 2005. W3C Recommendation. URL:https://www.w3.org/TR/xml-id/


[8]ページ先頭

©2009-2025 Movatter.jp