RFC 8759 | RTP Payload for TTML Timed Text | March 2020 |
Sandford | Standards Track | [Page] |
This memo describes a Real-time Transport Protocol (RTP) payload format forTimed Text Markup Language (TTML), an XML-based timed text format fromW3C. This payload format is specifically targeted at streaming workflows usingTTML.¶
This is an Internet Standards Track document.¶
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841.¶
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained athttps://www.rfc-editor.org/info/rfc8759.¶
Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶
TTML (Timed Text Markup Language)[TTML2] is a media type fordescribing timed text, such as closed captions and subtitles in televisionworkflows or broadcasts, as XML. This document specifies how TTML should bemapped into an RTP stream in streaming workflows, including (but not restrictedto) those described in the television-broadcast-oriented European BroadcastingUnion Timed Text (EBU-TT) Part 3[TECH3370] specification. This document does not define a media typefor TTML but makes use of the existing application/ttml+xml media type[TTML-MTPR].¶
Unless otherwise stated, the term "document" refers to the TTML documentbeing transmitted in the payload of the RTP packet(s).¶
The term "word" refers to a data word aligned to a specified number of bitsin a computing sense and not to linguistic words that might appear inthe transported text.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14[RFC2119][RFC8174] when, and only when, they appear in all capitals, as shown here.¶
Prior payload types for text are not suited to the carriage of closedcaptions in television workflows. "RTP Payload for Text Conversation"[RFC4103] is intended for low data rate conversation with its ownsession management and minimal formatting capabilities. "Definition of Events forModem, Fax, and Text Telephony Signals"[RFC4734] deals in largeparts with the control signalling of facsimile and other systems. "RTP Payload Format for3rd Generation Partnership Project (3GPP) Timed Text"[RFC4396]describes the carriage of a timed text format with much more restrictedformatting capabilities than TTML. The lack of an existing format for TTML orgeneric XML has necessitated the creation of this payload format.¶
TTML2 (Timed Text Markup Language, Version 2)[TTML2] is anXML-based markup language for describing textual information with associatedtiming metadata. One of its primary use cases is the description of subtitlesand closed captions. A number of profiles exist that adapt TTML2 for use inspecific contexts[TTML-MTPR]. These include both file-basedand streaming workflows.¶
In addition to the required RTP headers, the payload contains a section forthe TTML document being transmitted (User Data Words) and a field for thelength of that data. Each RTP payload contains one or part of one TTMLdocument.¶
A representation of the payload format for TTML isFigure 1.¶
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|V=2|P|X| CC |M| PT | Sequence Number |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Timestamp |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Synchronization Source (SSRC) Identifier |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Reserved | Length |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| User Data Words...+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
RTP packet header fieldsSHALL be interpreted, as per[RFC3550], with the following specifics:¶
TTML documents define a series of changes to text over time. TTML documentscarried in User Data Words are encoded in accordance with one or more of thedefined TTML profiles specified in the TTML registry[TTML-MTPR]. These profiles specify the document structure used,systems models, timing, and other considerations. TTML profiles may restrictthe complexity of the changes, and operational requirements may limit themaximum duration of TTML documents by a deployment configuration. Both ofthese cases are out of scope of this document.¶
Documents carried over RTPMUST conform to the followingprofile, in addition to any others used.¶
This section defines constraints on the content of TTML documents carriedover RTP.¶
Multiple TTML subtitle streamsMUST NOT be interleaved in asingle RTP stream.¶
The TTML document instance's roottt
element in thehttp://www.w3.org/ns/ttml
namespaceMUST include atimeBase
attribute in thehttp://www.w3.org/ns/ttml#parameter
namespace containing the valuemedia
.¶
This is equivalent to the TTML2 content profile definition document inFigure 2.¶
<?xml version="1.0" encoding="UTF-8"?><profile xmlns="http://www.w3.org/ns/ttml#parameter" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:tt="http://www.w3.org/ns/ttml" type="content" designator="urn:ietf:rfc:8759#content" combine="mostRestrictive"> <features xml:base="http://www.w3.org/ns/ttml/feature/"> <tt:metadata> <ttm:desc> This document is a minimal TTML2 content profile definition document intended to express the minimal requirements to apply when carrying TTML over RTP. </ttm:desc> </tt:metadata> <feature value="required">#timeBase-media</feature> <feature value="prohibited">#timeBase-smpte</feature> <feature value="prohibited">#timeBase-clock</feature> </features></profile>
This section defines constraints on the processing of the TTML documents carried over RTP.¶
If a TTML document is assessed to be invalid, then itMUST bediscarded. This includes empty documents, i.e., those of zero length. Whenprocessing a valid document, the following requirements apply.¶
Each TTML document becomes active at its epoch E. EMUST beset to the RTP Timestamp in the header of the RTP packet carrying the TTMLdocument. Computed TTML media times are offset relative to E, in accordancewith Section I.2 of[TTML2].¶
When processing a sequence of TTML documents, where each is delivered inthe same RTP stream, exactly zero or one documentSHALL beconsidered active at each moment in the RTP time line.In the event that a documentDn-1 with En-1 is active, and document Dn isdelivered with En where En-1 < En,processing of Dn-1MUST be stopped at Enand processing of DnMUST begin.¶
When all defined content within a document has ended, then processing of thedocumentMAY be stopped. This can be tested by constructing theintermediate synchronic document sequence from the document, as defined by[TTML2]. If the last intermediate synchronic document in thesequence is both active and contains no region elements, then all definedcontent within the document has ended.¶
As described above, the RTP Timestamp does not specify the exact timing ofthe media in this payload format. Additionally, documents may be fragmentedacross multiple packets. This renders the RTCP jitter calculationunusable.¶
This specification defines the following TTML feature extension designation:¶
urn:ietf:rfc:8759#rtp-relative-media-time
¶The namespaceurn:ietf:rfc:8759
is as defined by[RFC2648].¶
A TTML content processor supports the#rtp-relative-media-time
feature extension if it processes media times in accordance with the payloadprocessing requirements specified in this document, i.e., that the epoch E isset to the time equivalent to the RTP Timestamp, as detailed above inSection 6.¶
The required syntax and semantics declared in the minimal TTML2 processorprofile inFigure 3MUST be supported bythe receiver,as signified by thosefeature
orextension
elements whosevalue
attribute is set torequired
.¶
<?xml version="1.0" encoding="UTF-8"?><profile xmlns="http://www.w3.org/ns/ttml#parameter" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:tt="http://www.w3.org/ns/ttml" type="processor" designator="urn:ietf:rfc:8759#processor" combine="mostRestrictive"> <features xml:base="http://www.w3.org/ns/ttml/feature/"> <tt:metadata> <ttm:desc> This document is a minimal TTML2 processor profile definition document intended to express the minimal requirements of a TTML processor able to process TTML delivered over RTP according to RFC 8759. </ttm:desc> </tt:metadata> <feature value="required">#timeBase-media</feature> <feature value="optional"> #profile-full-version-2 </feature> </features> <extensions xml:base="urn:ietf:rfc:8759"> <extension restricts="#timeBase-media" value="required"> #rtp-relative-media-time </extension> </extensions></profile>
Note that this requirement does not imply that the receiver needs tosupport either TTML1 or TTML2 profile processing, i.e., the TTML2#profile-full-version-2
feature or any ofits dependent features.¶
Thecodecs
media type parameterMUST specify atleast one processor profile. Short codes for TTML profiles are registered at[TTML-MTPR]. The processor profiles specified incodecs
MUST be compatible with the processor profilespecified in this document. Where multiple options exist incodecs
for possible processor profile combinations (i.e., separated by|
operator), every permitted optionMUST be compatible with theprocessor profile specified in this document. Where processor profiles (otherthan the one specified in this document) are advertised in thecodecs
parameter, the requirements of the processor profile specified in thisdocumentMAY be signalled, additionally using the+
operator with its registered short code.¶
A processor profile (X) is compatible with the processor profile specifiedhere (P) if X includes all the features and extensions in P (identified bytheir character content) and thevalue
attribute of each is, at least,as restrictive as thevalue
attribute of the feature or extension inP that has the same character content. The term "restrictive" here is asdefined in Section 6 of[TTML2].¶
Figure 4 is an example of a valid TTML document that maybe carried using the payload format described in this document.¶
<?xml version="1.0" encoding="UTF-8"?><tt xml:lang="en" xmlns="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:ttp="http://www.w3.org/ns/ttml#parameter" xmlns:tts="http://www.w3.org/ns/ttml#styling" ttp:timeBase="media" > <head> <metadata> <ttm:title>Timed Text TTML Example</ttm:title> <ttm:copyright>The Authors (c) 2006</ttm:copyright> </metadata> <styling> <!-- s1 specifies default color, font, and text alignment --> <style xml:id="s1" tts:color="white" tts:fontFamily="proportionalSansSerif" tts:fontSize="100%" tts:textAlign="center" /> </styling> <layout> <region xml:id="subtitleArea" tts:extent="78% 11%" tts:padding="1% 5%" tts:backgroundColor="black" tts:displayAlign="after" /> </layout> </head> <body region="subtitleArea"> <div> <p xml:id="subtitle1" dur="5.0s"> How truly delightful! </p> </div> </body></tt>
Many of the use cases for TTML are low bit-rate with RTP packets expectedto fit within the Path MTU. However, some documents may exceed the PathMTU. In these cases, they may be split between multiple packets. Wherefragmentation is used, the following guidelinesMUST befollowed:¶
It isRECOMMENDED that documents be fragmented as seldomas possible, i.e., the least possible number of fragments is created out of adocument.¶
Text stringsMUST split at character boundaries. Thisenables decoding of partial documents. As a consequence, documentfragmentation requires knowledge of the UTF-8/UTF-16 encoding formats todetermine character boundaries.¶
Document fragmentsSHOULD be protected against packetlosses. More information can be found inSection 9.¶
When a document spans more than one RTP packet, the entire document isobtained by concatenating User Data Words from each consecutive contributingpacket in ascending order of Sequence Number.¶
As described inSection 6, only zero or one TTMLdocument may be active at any point in time. As such, thereMUST only be one document transmitted for a given RTPTimestamp. Furthermore, as stated inSection 4.1, themarker bitMUST be set for a packet containing the lastfragment of a document. A packet following one where the marker bit is setcontains the first fragment of a new document. The first fragment might alsobe the last.¶
Consideration must be devoted to keeping loss of documents due to packetloss within acceptable limits. What is deemed acceptable limits is dependenton the TTML profile(s) used and use case, among other things. As such, specificlimits are outside the scope of this document.¶
DocumentsMAY be sent without additional protection ifend-to-end network conditions guarantee that document loss will be withinacceptable limits under all anticipated load conditions. Where such guaranteescannot be provided, implementationsMUST use a mechanism toprotect against packet loss. Potential mechanisms include Forward ErrorCorrection (FEC)[RFC5109], retransmission[RFC4588], duplication[ST2022-7], or an equivalenttechnique.¶
Congestion control for RTPSHALL be used in accordance with[RFC3550] and with any applicable RTP profile, e.g.,[RFC3551]. "Multimedia Congestion Control: Circuit Breakers forUnicast RTP Sessions"[RFC8083] is an update to"RTP: A Transport Protocol for Real-timeApplications"[RFC3550], which defines criteria for when one is required tostop sending RTP packet streams. Applications implementing this standardMUST comply with[RFC8083], with particularattention paid to Section4.4on Media Usability.[RFC8085] provides additional informationon the best practices for applying congestion control to UDP streams.¶
This RTP payload format is identified using the existingapplication/ttml+xml media type as registered with IANA[IANA]and defined in[TTML-MTPR].¶
The default clock rate for TTML over RTP is 1000 Hz. The clock rateSHOULD be included in any advertisements of the RTP streamwhere possible. This parameter has not been added to the media type definitionas it is not applicable to TTML usage other than within RTP streams. In othercontexts, timing is defined within the TTML document.¶
When choosing a clock rate, implementers should consider what other mediatheir TTML streams may be used in conjunction with (e.g., video or audio). Inthese situations, it isRECOMMENDED that streams use the sameclock source and clock rate as the related media. As TTML streams may beaperiodic, implementers should also consider the frequency range over whichthey expect packets to be sent and the temporal resolution required.¶
The mapping of the application/ttml+xml media type and its parameters[TTML-MTPR]SHALL be done according toSection 3 of [RFC4855].¶
The type name "application" goes in SDP "m=" as the media name.¶
The media subtype "ttml+xml" goes in SDP "a=rtpmap" as the encoding name.¶
The clock rate also goes in "a=rtpmap" as the clock rate.¶
Additional format-specific parameters, as described in the media typespecification,SHALL be included in the SDP file in "a=fmtp" asa semicolon-separated list of "parameter=value" pairs, as described in[RFC4855]. Thecodecs
parameterMUST beincluded in thea=fmtp
line of the SDP file. Specific requirementsfor the "codecs" parameter are included inSection 6.1.3.¶
A sample SDP mapping is presented inFigure 5.¶
m=application 30000 RTP/AVP 112a=rtpmap:112 ttml+xml/90000a=fmtp:112 charset=utf-8;codecs=im2t
In this example, a dynamic payload type 112 is used. The 90 kHz RTPtimestamp rate is specified in the "a=rtpmap" line after the subtype.The codecs parameter defined in the "a=fmtp" line indicates that the TTML dataconforms to Internet Media and Captions (IMSC) 1.1 Text profile[TTML-IMSC1.1].¶
This document has no IANA actions.¶
RTP packets using the payload format defined in this specification aresubject to the security considerations discussed in the RTP specification[RFC3550] and in any applicable RTP profile, such as RTP/AVP[RFC3551], RTP/AVPF[RFC4585], RTP/SAVP[RFC3711], or RTP/SAVPF[RFC5124].However, as"Securing the RTP Protocol Framework: Why RTP Does Not Mandate a Single MediaSecurity Solution"[RFC7202] discusses, it is not an RTPpayload format's responsibility to discuss or mandate what solutions are usedto meet the basic security goals (like confidentiality, integrity, and sourceauthenticity) for RTP in general. This responsibility lays on anyone using RTPin an application. They can find guidance on available security mechanismsand important considerations in "Options for Securing RTP Sessions"[RFC7201]. ApplicationsSHOULD use one or moreappropriate strong security mechanisms. The rest of this SecurityConsiderations section discusses the security impacting properties of thepayload format itself.¶
To avoid potential buffer overflow attacks, receivers should take care tovalidate that the User Data Words in the RTP payload are of the appropriatelength (using the Length field).¶
This payload format places no specific restrictions on the size of TTMLdocuments that may be transmitted. As such, malicious implementations could beused to perform denial-of-service (DoS) attacks.[RFC4732] provides more information on DoS attacks and describes somemitigation strategies. Implementers should take into consideration that thesize and frequency of documents transmitted using this format may vary overtime. As such, sender implementations should avoid producing streams thatexhibit DoS-like behaviour, and receivers should avoid false identification ofa legitimate stream as malicious.¶
As with other XML types and as noted inSection 10 of "XML Media Types" [RFC7303],repeated expansion of maliciously constructed XMLentities can be used to consume large amounts of memory, which may cause XMLprocessors in constrained environments to fail.¶
In addition, because of the extensibility features for TTML and of XML ingeneral, it is possible that "application/ttml+xml" may describe content thathas security implications beyond those described here. However, TTML does notprovide for any sort of active or executable content, and if the processorfollows only the normative semantics of the published specification, thiscontent will be outside TTML namespaces and may be ignored. Only in the casewhere the processor recognizes and processes the additional content or wherefurther processing of that content is dispatched to other processors wouldsecurity issues potentially arise. And in that case, they would fall outsidethe domain of this RTP payload format and the application/ttml+xmlregistration document.¶
Although not prohibited, there are no expectations that XML signatures orencryption would normally be employed.¶
Further information related to privacy and security at a document level canbe found in Appendix P of[TTML2].¶
Thanks toNigel Megitt,James Gruessing,Robert Wadge,Andrew Bonney,James Weaver,John Fletcher,Frans de Jong, andWillem Vermost for their valuablefeedback throughout thedevelopment of this document. Thanks to the W3C Timed Text Working Group andEBU Timed Text Working Group for their substantial efforts in developing thetimed text format this payload format is intended to carry.¶