Movatterモバイル変換


[0]ホーム

URL:


W3C

WebVMT: The Web Video Map Tracks Format

W3C Group Note

More details about this document
This version:
https://www.w3.org/TR/2023/NOTE-webvmt-20230919/
Latest published version:
https://www.w3.org/TR/webvmt/
Latest editor's draft:
https://w3c.github.io/sdw/proposals/geotagging/webvmt/
History:
https://www.w3.org/standards/history/webvmt/
Commit history
Editor:
Rob Smith (Away Team Software Ltd)
Feedback:
GitHub w3c/sdw (pull requests,new issue,open issues)
public-sdwig@w3.org with subject line[webvmt]… message topic … (archives)
OGC Document Number
OGC 23-037

Copyright © 2023World Wide Web Consortium.W3C®liability,trademark andpermissive document license rules apply.


Abstract

This specification defines WebVMT, the Web Video Map Tracks format, which is an enabling technology whose main use is for marking up external metadata track resources in connection with the HTML<track> element. WebVMT files provide map presentation, annotation and interpolation synchronized to web media content, and more generally any form of data that is time-aligned with audio or video content, including those from location-aware devices such as dashcams, drones and smartphones.

Status of This Document

This section describes the status of this document at the time of its publication. A list of currentW3C publications and the latest revision of this technical report can be found in theW3C technical reports index at https://www.w3.org/TR/.

This document is a Note, it has not been widely reviewed and should be considered as experimental only. It may serve as the base for an upcomingW3C Recommendation.

This document is an explanatory specification, intended to communicate and develop the draft WebVMT format through discussion with user communities.

This document was published by theSpatial Data on the Web Working Group as a Group Note using theNote track.

This Group Note is endorsed by theSpatial Data on the Web Working Group, but is not endorsed byW3C itself nor its Members.

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

TheW3C Patent Policy does not carry any licensing requirements or commitments on this document.

This document is governed by the12 June 2023W3C Process Document.

1.Use Cases

This section is non-normative.

This section details example scenarios in which WebVMT can add significant value with identified benefits.

1.1Coastguard/Mountain Rescue

A missing person is reported to the rescue services, who deploy a drone to search inaccessible areas of coastline or moorland for their target. The drone relays back a live video stream from its camera and geolocation data from its GPS receiver to a remote human operator who is piloting it.

As the search continues, the operator spots a target on the video feed and can instantly call up an electronic map, synchronized to the video, which has been automatically following the drone’s position and plotting its ground track. The display gives the operator immediate context for the video, and allows them to override the automatic map control and zoom in to pinpoint the target’s precise location from the features visible in the video and on the map/satellite view. They mark the location and then zoom out to assess the surrounding terrain and advise the recovery team of the best approach to the target. For example, the terrain may dictate very different approach routes if either the person has twisted their ankle at the top of a cliff, or has fallen and is lying at the bottom of the same cliff, though the co-ordinates are almost identical in both cases.

The operator has been able to make important decisions quickly, which may be life critical, and deploy recovery resources effectively.

Note: Benefits
  • Rapid decision making;
  • Effective resource deployment.

1.2Area Survey

A survey drone is equipped with a camera which records an image of the ground directly below it. The pilot is a remote human operator, tasked with surveying a defined area from particular height in order to capture the required data.

As the survey progresses, zones are automatically marked on the map to represent areas which have been included. Once the drone has finished its sweep, the operator can quickly confirm whether the required area has been completely covered. If any areas have been missed, the pilot can use the map to navigate and make additional passes to fill the gaps, before returning to base.

Adding WebVMT files to the survey archive provides a geospatial index to the videos, allowing a particular geographic location to be found more rapidly by virtue of their small file size in comparison to their linked media. Online video archives can be indexed more quickly using this web-friendly format.

The operator has been easily able to verify the quality of their own work and correct any errors, saving time and additional effort in redeployment. Video footage has been indexed by geolocation rapidly and in a search-engine-friendly format.

Note: Benefits
  • Autonomous quality assurance;
  • Cost saving;
  • Rapid archive indexing for search engines.

1.3Outdoor Trails

An outdoor sportsperson, e.g. snowboarder or cyclist, is equipped with a helmet camera and/or mobile phone to record video footage and GPS data. They set off to find new challenges and practice their skills, e.g. off-piste or on mountain trails, and discover new routes and areas that they would like to explore in future, chatting to the camera as they go. Afterwards, they upload the video to share their experience with the online community, so others can quickly identify locations of particularly interesting sections of the featured trail. Using the synchronized map view in their browser, community members can easily see where they need to go in order to explore these places for themselves.

The operator has been able to fully engage in their sporting activity, without making any written notes, while simultaneously recording the details needed to guide others to the same locations. Their changing location over time can also be used to calculate speed and distance information, which can be displayed alongside the footage.

Note: Benefits
  • Non-invasive capture;
  • Information sharing;
  • Speed and distance calculation.

1.4TV Sports Coverage

A TV production company is covering a sports event that takes place over a large area, e.g. rallying, road cycling or sailing, using a number of mobile video devices including competitor cams, e.g. dash cams or helmet cams, and drones to provide shots of inaccessible areas, e.g. remote terrain or over water.

Feeds from all the cameras are streamed to the production control room, where their geolocation data are combined on a map showing the locations of every competitor and camera, each labelled for easy identification. The live map enables the director to quickly choose the best shot and anticipate where and when to deploy their drone cameras to catch competitors at critical locations on the course as the competition develops in real time.

Multiple operators can function concurrently, both autonomously and under central direction. Mobile assets can be monitored and deployed from an operations centre to provide optimum coverage of the developing live event.

Note: Benefits
  • Multiple mobile video devices;
  • Real time asset management.

1.5Proxy Explorer

Important details of a remote area have been captured on video. It is not possible to revisit the location for safety reasons or because it has physically changed in the intervening time. Footage can be retrospectively geotagged against a concurrent map to allow the viewer to better interpret and identify features seen in the footage. Explanatory annotations can be added to the WebVMT file to help future viewers' understanding and aggregate the collective analysis.

Multiple operators can contribute their observations to provide a group analysis, iteratively adding new details and discarding out-of-date information. Experts can offer insight about filmed locations, which would otherwise be inaccessible to them.

Note: Benefits
  • Remote analysis of inaccessible locations;
  • Knowledge aggregation and sharing for archive footage.

1.6Treasure Hunt

A TV production company designs a new game show which involves competitors searching for targets across a wide area, with an operations centre remotely monitoring their progress and providing updates. Competitors are equipped with body-worn video or helmet cameras to relay footage of their view.

Geolocation context allow central operators to better understand the participants' actions and to remotely direct them more efficiently. Competitors' positions can displayed to the TV audience on annotated 2D- or 3D-maps for clearer presentation.

Note: Benefits
  • Speed and distance calculation;
  • Knowledge aggregation and sharing for real-time footage.

1.7Swarm Monitoring

A swarm of drones is deployed to perform a task, and their operations are monitored centrally. Geolocation details of the swarm are automatically collated and broadcast to the drone pilots, showing the locations of all the drones and each is circled with a suitable safety zone to warn their operators in case two units find themselves flying in close proximity.

Pilots are safely able to operate either autonomously or under the direction of central control. Extra zonal information can be added to the operators' maps to show the outer perimeter of their operating area and warn of fixed aerial hazards, e.g. a radio mast, or transient hazards, e.g. a helicopter.

Note: Benefits
  • Static and dynamic hazard indication;
  • Central swarm monitoring;
  • Autonomous swarm monitoring.

1.8Crisis Response

Disaster strikes, e.g. hurricane or tsunami, and emergency response teams are deployed to the affected area. However, it is difficult to verify which problems people are facing, what resources would help them and exactly where these events are occurring. Maps are unreliable as the infrastructure has been damaged, though people on the ground have the relevant knowledge if it could be reliably recorded and shared.

Anyone with a basic smartphone could video events with reliable geospatial data, as GPS receivers can operate without the need for a mobile phone signal by using satellite data, to accurately document the problems they face. Even if the cell network is not operational, this information can be physically delivered to crisis coordinators to notify them of the issues that need to be addressed, including accurate location data in a common format. Response teams can quickly search archived video by location to verify latest updates with recent context. Crisis events can be reliably recorded, knowledge can be shared and aggregated, and relief resources can be accurately targeted and deployed to the correct locations.

Note: Benefits
  • Reliable dispersed information gathering and sharing;
  • Accurate resource deployment.

1.9Police Evidence

A web-based police system is established to allow dashcam video evidence of driving offences to be submitted digitally by members of the public who have witnessed them. Detectives are able to identify the time and vehicles involved directly from the uploaded footage, and accurately determine the location at which the incident occurred from the digital timed metadata included.

The ability to accept open format data also makes the system available to cyclists and pedestrians who can record video with location on their helmet cameras and smartphones respectively, providing wider access to the service beyond the dashcam community. Metadata, e.g. location, from different video manufacturers is often recorded in mutually-incompatible formats, but WebVMT support enables synchronized location (and other) data to be extracted from recordings using manufacturers' or community tools, without affecting source video integrity, and submitted to the police system in a common format, significantly reducing development costs.

Officers have been able to identify incident locations quickly and accurately, without sacrificing evidence integrity. The online service has been made available to a wider audience of drivers, cyclists and pedestrians, without incurring additional development costs.

Note: Benefits
  • Accurate location with evidence integrity preserved;
  • Development costs reduced;
  • Service extended to a wider audience.

1.10Area Monitoring

An area of interest is monitored operationally by a collection of different mobile video devices, e.g. drones, body-worn video, helicopter, etc. Video footage, possibly in different formats, is added to an archive with location (and other) metadata in a common format which forms a time-location index suitable for rapid parsing by a web crawler. Users can submit online queries to search by location and return a time-ordered sequence of video frame stills captured within a radial distance of the chosen location. Alternatively, sensor data can be searched, e.g. for high readings, to return matching geotagged video frames for further analysis.

Video archives can be quickly indexed using a common metadata format regardless of video encoding, e.g. MPEG, WebM, OGG, and video files are only accessed in case of a positive search result, which reduces bandwidth in comparison to embedded metadata. Linked files also allow different security permissions to be applied to the crawling and querying processes, so an AI algorithm can be authorised to read metadata without being able to access image content if there are security concerns over data privacy, e.g. illicit facial recognition.

Note: Benefits
  • Homogenized video metadata from disparate sources;
  • Reduced search bandwidth;
  • Structured security support;
  • Web search engine compatible.
Example 1: Remote Maintenance
Visually monitor inaccessible assets, e.g. off-shore wind turbines, using autonomous drones to create a historical video archive that enables remote expert diagnosis of operational issues.
Example 2: Flood Monitoring
Aggregate video footage from disparate sources to create a historical video archive that allows water levels to be monitored at different locations over time to help predict flooding.

1.11Vehicle Collision

Dashcam footage is searched to automatically identify vehicle collisions from impact acceleration profiles recorded in video metadata. Dashcam manufacturers typically embed metadata in an unpublished format and provide a proprietary video player to allow users to display it. Exporting embedded metadata to a linked file in a web-friendly format enables searchable video archive data to be shared quickly and easily, without affecting evidence integrity, and to be accessed through a common web interface.

Vehicles can be automatically monitored using a low-cost dashcam and web-based tools to ensure that collisions are accurately recorded by drivers and that commercial vehicles remain safe and undamaged. Interoperability means that users are not limited to a particular brand and can share evidence with insurers and the police in a common format without damaging its integrity.

Note: Benefits
  • Accurate vehicle collision detection;
  • Common format for data sharing;
  • Web search support;
  • Evidence integrity preserved.
Example 3: Motor Insurance
Automatically identify vehicle collisions in dashcam footage to provide forensic evidence for a police investigation or motor insurance claim.

1.12Golden Tutorial

Augmented reality (AR) software is used to control assets or view content in situ at a particular location. For example, nearby street lights can be switched off or on by a service engineer for maintenance purposes, or an architect can see how their structural design integrates with the surrounding landscape at its proposed location before any building work has started.

Video footage can be recorded with location, camera orientation and other metadata so AR overlays be generated on demand. Such recordings can be used to demonstrate how AR content is displayed and controlled in order to educate users with a 'golden tutorial', to provide 'proof of action' as evidence of work done for auditing purposes, or to create example data for AR software testing and debugging.

Note: Benefits
  • Accurate AR video and data recording;
  • Improved AR software development.

1.13Virtual Guide

A user triggers an audio track which provides guidance about the local area or instruction for a known object, e.g. Web of Things (WoT) device at that location. The audio timeline is synchronized with events that can display AR content, control WoT devices and display points of interest on a map which provide guidance with real world context by highlighting places or objects of interest and showing possible actions.

Users can be guided by a virtual assistant through an area of interest or sequence of actions augmented with AR/VR and WoT devices to visualise events and by an annotated map or model to provide additional geospatial context. Greater insight is given to the user by showing detailed views of the location on a map or internal structure of the identified object using a virtual model.

Note: Benefits
  • Contextual guidance provided in situ;
  • Concurrent operation with AR/VR video;
  • Integration with Web of Things.
Example 4: Historic Site Guide
Provide an audio guide to visitors at a historic site which is triggered by proximity to a location of interest and synchronized with AR content which reconstructs past buildings and events. The same concept could be implemented in VR to allow remote users to explore the site.
Example 5: Medical Tool
Allow a patient to describe their symptoms using AR, e.g. by pointing to a painful area on their own body, which is also modelled as a 'map' to show internal features and display a treatment guide, including any WoT medical devices.
Example 6: On-Site Maintenance
Guide a user to control and maintain assets, e.g. infrastructure or machinery. A maintenance engineer could switch off an individual street light in order to replace the bulb using an AR control on the WoT lamppost and request procedural guidance for that particular variant. A farm worker could be guided through operational and diagnostic procedures for agricultural equipment to help rectify a fault while on site.

2.State of the Art

This section is non-normative.

No standard format currently exists by which web browsers can synchronise geolocation data with video. Though many browser-supported formats exist to present the two data streams separately, e.g. MPEG for video and GPX for geolocation, there is no viable synchronisation mechanism for video playback time with geolocation information.

2.1Current Solutions

Material Exchange Format (MXF) was developed by the Society of Motion Picture and Television Engineers (SMPTE) to synchronise metadata, including geolocation, with audio and video streams using a register of key-length-value (KLV) triples. The breadth of its scope has resulted in interoperability issues, as different vendors implement different parts of the standard, and has produced implementations from high-profile companies which are mutually incompatible. KLVs can also be embedded within MPEG files, though this does not address the synchronisation issue for other web video formats such as WebM.

Video camera manufacturers have taken various approaches, resulting in a number of non-standard solutions including embedding geolocation data within the MPEG metadata stream in disparate formats, such as Motion Imagery Standards Board (MISB) or Go-Pro Metadata Format (GPMF), or recording a separate geolocation file in a proprietary format alongside the associated video file. From a hardware perspective, a few high-end cameras provide geotagging out of the box and all require an add-on device to support this feature.

Geospatial data are not currently accessible in the video Document Object Model (DOM) in HTML nor via video playback APIs in smartphones, e.g. Android, though their host devices are typically equipped with both a video camera and Global Navigation Satellite System (GNSS) receiver capable of capturing the required information.

In sharp contrast, still photos have a well-established geotagging standard called Exif, which was published by the Japan Electronic Industries Development Association (JEIDA) in 1995 and defines a metadata tag to embed geolocation data within TIFF and JPEG images. This is widely supported by manufacturers of photographic equipment and software worldwide, including low-end smartphones, making this feature cheap and accessible to the public.

2.2Growing Requirements

Historically, there has been no requirement for a comparable video standard, but the urgency for such a standard is growing fast due to the emerging markets for 'mobile video devices,' e.g. drones, dashcams, body-worn video and helmet cameras, as well as the rise in high-quality video and geolocation support in the global smartphone market.

2.3Accessible Standard Opportunity

Using currentW3C recommendations, it is possible for a programmer to synchronise video-geolocation 'metadata' with a<video> element using a<track> child element. However, this is a non-trivial development task which requires an understanding of the video DOM events and Javascript file handling, making it inaccessible to the vast majority of web users. Video metadata tracks are an identified kind of track data in HTML, though metadata content is difficult to access due to the text-based nature of existing DOM support.

Establishing a standard file format would allow interoperability and information sharing between the public, the emergency services, police and other mobile video device users, e.g. drone pilots, giving cheaper and easier access to this important resource. Native web browser support for geotagged video using this file format would also make this freely accessible to most web users and enable integration with existing web services such as online maps and search engines. Current low-end smartphones already provide suitable hardware to concurrently capture video and geolocation streams, which would make this technology easily accessible to the general public, and encourage the user and developer communities to grow rapidly.

3.Proposed Solution

This section is non-normative.

This proposal constitutes a lightweight markup language to synchronise video with geolocation data for display on electronic maps, such as OpenStreetMaps. It offers presentational control of the map display, e.g. pan and zoom, and annotation to highlight map features to the viewer, e.g. paths and zones.

WebVMT (Web Video Map Tracks) format is intended for marking up external map track resources, and its main use is for files synchronising video content with an annotated map presentation. Ideas have been borrowed from existingW3C formats, including WebVTT's HTML binding and its block and cue structures, and SVG's approach to drawing and interpolation, in order to display output on an electronic map.

The format mimics WebVTT's structure and syntax for media synchronisation, with cue details listed in an accessible text-based file linked to a<video> or<audio> DOM element by a child<track> element in an HTML document.

Example 7: WebVMT Basic HTML
<!doctype html><html>  <head>    <title>WebVMT Basic Example</title>  </head>  <body>    <!-- Video display -->    <video controls width="640" height="360">      <source src="video.mp4" type="video/mp4">      <track src="maptrack.vmt" kind="metadata" for="vmt-map" tileurl="https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png?key=VALID_OSM_KEY">      Your browser does not support the video tag.    </video>    <!-- Map display -->    <div></div>  </body></html>

The WebVMT format file, e.g. maptrack.vmt, contains the map cues associated with the video, e.g. video.mp4.

Editor's note

The meaning offor andtileurl attributes for user agents is an open question. Initial solutions can be built using Javascript, with existing map libraries such as Leaflet, though the vision is that future user agents will handle map rendering in the longer term.

3.1Map Cues

Map cues display their payload between a start time and end time. The end cue time may be omitted to represent an unknown time.

3.1.1Hello World

Here is a sample WebVMT file with a cue highlighting Tower Bridge in London on a static map.

Example 8: Tower Bridge WebVMT
WEBVMTMEDIAurl:TowerBridge.mp4mime-type:video/mp4MAPlat:51.506 lng:-0.076rad:25000:00:02.000 --> 00:00:05.000{ "move-to":  { "lat": 51.504362, "lng": -0.076153 }}{ "line-to":  { "lat": 51.506646, "lng": -0.074651 }}

3.1.2Map Presentation

Cues also allow dynamic presentation to pan and zoom the map. This example focusses attention on the Tower of London.

Cues without end times are displayed until the end of the video.

Example 9: Tower of London WebVMT
WEBVMTMEDIAurl:../movies/TowerOfLondon.webmmime-type:video/webmMAPlat:51.162 lng:-0.143rad:2000000:00:03.000 -->{ "pan-to":  { "lat": 51.508, "lng": -0.077, "end": "00:00:05.000" }}00:00:06.000 -->{ "zoom":  { "rad": 250 }}

3.2Comments

Comments are blocks that are preceded by a blank line, start with the wordNOTE (followed by a space or newline), and end at the first blank line.

3.2.1Comment Block

Comment block format is identical toWebVTT.

Example 10: Tower Landmarks WebVMT
WEBVMTNOTE Associated videoMEDIAurl:/home/myuser/movies/TowerLandmarks.oggmime-type:video/oggNOTE Map configMAPlat:51.506 lng:-0.076rad:500NOTE Tower Bridge00:00:01.000 --> 00:00:05.000{ "move-to":  { "lat": 51.504362, "lng": -0.076153 }}{ "line-to":  { "lat": 51.506646, "lng": -0.074651 }}NOTE City Hall00:00:02.000 -->{ "circle":  { "lat": 51.504789, "lng": -0.078642, "rad": 20 }}NOTE Tower Of LondonThis line is also part of the comment00:00:03.000 --> 00:00:04.000{ "polygon":  { "perim":    [ { "lat": 51.507193, "lng": -0.074844 },      { "lat": 51.508756, "lng": -0.074716 },      { "lat": 51.509036, "lng": -0.075638 },      { "lat": 51.508929, "lng": -0.077162 },      { "lat": 51.507727, "lng": -0.077848 },      { "lat": 51.507220, "lng": -0.075767 }    ]  }}

3.3Styling

Display style is controlled by CSS, which may be embedded in HTML or within the WebVMT file.

3.3.1CSS Style in HTML

In this example, an HTML page has a CSS style sheet in a<style> element that styles map cues for the video, e.g. drawing lines in red.

Example 11: Style Within HTML
<!doctype html><html>  <head>    <title>WebVMT Style Example</title>    <style>      video::cue {        stroke: red;        stroke-opacity: 0.9;      }    </style>  </head>  <body>    <video controls width="640" height="360">      <source src="video.mp4" type="video/mp4">      <track src="maptrack.vmt" kind="metadata" for="vmt-map" tileurl="https://api2.ordnancesurvey.co.uk/mapping_api/v1/service/zxy/EPSG%3A3857/Outdoor%203857/\{z}/{x}/{y}.png?key=VALID_OS_KEY">      Your browser does not support the video tag.    </video>    <div></div>  </body></html>

3.3.2CSS Style Block

Style block format is similar toWebVTT.

CSS style sheets can also be embedded withinWebVMT files. Style blocks are placed after any headers but before the first cue, and start with the wordSTYLE.

Comment blocks can be interleaved with style blocks.

Example 12: Greenwich Meridian WebVMT
WEBVMTMEDIAurl:http://example.com/movies/Greenwich.mp4mime-type:video/mp4MAPlat:51.478 lng:-0.001rad:50STYLE::cue {  stroke: red;}NOTE Comments are allowed between style blocksSTYLE::cue {  stroke-opacity: 0.9;}/* Style blocks cannot use blank lines nor "dash dash greater than" */NOTE Prime Meridian marker00:00:00.000 -->{ "move-to":  { "lat":51.477901, "lng": -0.001466 }}{ "line-to":  { "lat":51.477946, "lng": -0.001466 }}NOTE Style blocks may not appear after the first cue

3.4Data Synchronization

Arbitrary data may be associated with aWebVMT cue using async command, in a similar fashion to the GPX<extension> element.

Example 13: AI Training WebVMT
WEBVMTNOTE Associated videoMEDIAurl:Animals.mp4mime-type:video/mp4NOTE Map configMAPlat:51.1618 lng:-0.1428rad:200NOTE Cat, top left, after 5 secs until 25 secs00:00:05.000 —> 00:00:25.000{ “sync”: { “type”: “org.ogc.geoai.example”, “data”:  { “animal”:”cat”, “frame-zone”:”top-left" }} }NOTE Dog, mid right, after 10 secs until 40 secs00:00:10.000 —> 00:00:40.000{ “sync”: { “type”: “org.ogc.geoai.example”, “data”:  { “animal”: ”dog”, “frame-zone”: ”middle-right" }} }

3.5Interpolation

Data values may be interpolated using aninterp command, in a similar way to the<animate> element inSVG.

Sensor data can be interpolated between sample points to provide intermediate values where necessary, while retaining the original source data sample values.

Three interpolation schemes are supported:

3.5.1Step Interpolation

A stepwise-interpolated value, e.g. vehicle gear selection, remains constant until the next sample time.

Step Interpolation
Figure1Step interpolation.
Example 14: Step Interpolation WebVMT
WEBVMTNOTE Required blocks omitted for clarityNOTE Step interpolation of sensor1 data     gear = 4 after 2 secs until 6 secs00:00:02.000 --> 00:00:06.000{ "sync":  { "type": "org.webvmt.example", "id": "sensor1", "data":    { "gear": "4" }  }}NOTE Step interpolation of sensor1 data     gear = 5 after 6 secs until 9 secs00:00:06.000 --> 00:00:09.000{ "sync":  { "id": "sensor1", "data":    { "gear": "5" }  }}

3.5.2Linear Interpolation

A linearly-interpolated value, e.g. temperature, changes to a final value at the next sample time in direct proportion to the elapsed sample interval.

Linear Interpolation
Figure2Linear interpolation.
Example 15: Linear Interpolation WebVMT
WEBVMTNOTE Required blocks omitted for clarityNOTE Linear interpolation of sensor2 data     temperature = 14 -> 16 after 4 secs until 6 secs00:00:04.000 --> 00:00:06.000{ "sync":  { "type": "org.webvmt.example", "id": "sensor2", "data":    { "temperature": "14" }  }}{ "interp": { "to":  { "data": { "temperature": "16" } }} }NOTE Linear interpolation of sensor2 data     temperature = 16 -> 19 after 6 secs until 9 secs00:00:06.000 --> 00:00:09.000{ "sync":  { "id": "sensor2" }}{ "interp": { "to":  { "data": { "temperature": "19" } }} }

3.5.3Discrete Interpolation

A discretely-interpolated value, e.g. headcount in a video frame, is only valid instanteously at the sample time.

Discrete Interpolation
Figure3Discrete interpolation.
Example 16: Discrete Interpolation WebVMT
WEBVMTNOTE Required blocks omitted for clarityNOTE Discrete interpolation of sensor3 data     headcount = 12 at 4 secs00:00:04.000 --> 00:00:04.000{ "sync":  { "type": "org.webvmt.example", "id": "sensor3", "data":    { "headcount": "12" }  }}NOTE Discrete interpolation of sensor3 data     headcount = 34 at 6 secs00:00:06.000 --> 00:00:06.000{ "sync":  { "id": "sensor3", "data":    { "headcount": "34" }  }}

3.5.4Live Stream Interpolation

Live streams can be recorded with interpolation usingunbounded cues, i.e. a cue with an unknown end time.

In this example, the result is identical to the previousstep interpolation example but without requiring knowledge of any future data values during the live capture process.

Example 17: Live Step Interpolation WebVMT
WEBVMTNOTE Required blocks omitted for clarityNOTE Step interpolation of live1 data     gear = 4 after 4 secs until next update00:00:04.000 -->{ "sync":  { "type": "org.webvmt.example", "id": "live1", "data":    { "gear": "4" }  }}NOTE Step interpolation of live1 data     gear = 5 after 6 secs until next update00:00:06.000 -->{ "sync":  { "id": "live1", "data":    { "gear": "5" }  }}NOTE End (step) interpolation of live1 data     gear = 5 at 9 secs00:00:09.000 --> 00:00:09.000{ "sync":  { "id": "live1", "data":    { "gear": "5" }  }}

In the next example, the result is identical to the previouslinear interpolation example but without requiring knowledge of any future data values during the live capture process.

Example 18: Live Linear Interpolation WebVMT
WEBVMTNOTE Required blocks omitted for clarityNOTE Linear interpolation of live2 data     temperature = 14 after 4 secs until next update00:00:04.000 -->{ "sync":  { "type": "org.webvmt.example", "id": "live2", "data":    { "temperature": "14" }  }}{ "interp": { "end": "00:00:06.000", "to":  { "data": { "temperature": "16" } }} }NOTE Linear interpolation of live2 data     temperature = 16 after 6 secs until next update00:00:06.000 -->{ "sync":  { "id": "live2"}}{ "interp": { "end": "00:00:09.000", "to":  { "data": { "temperature": "19" } }} }NOTE End (linear) interpolation of live2 data     temperature = 19 at 9 secs00:00:09.000 --> 00:00:09.000{ "sync":  { "id": "live2", "data":    { "temperature": "19" }  }}
Note: Interpolation During Live Capture

Values may not be interpolatedduring capture as future data are unknown, e.g. for linear interpolation, though can be correctly interpolatedafter capture, once end values are known during subsequent playbacks.

3.5.5Interpolated Path

AWebVMT path describes the trajectory of a moving object which consists of a timed sequence of locations. The object's location may be interpolated between consecutive values in the sequence to calculate the distance travelled over time.

Thepath attribute may be set to identify an individual path. This allows a path:

  • to be styled with CSS, e.g. colour;
  • to be associated with speed and distance attributes during playback;
  • to be uniquely associated with the video footage.

In this example, an interpolated path is traced from London to Brighton:

Example 19: London to Brighton WebVMT
WEBVMTNOTE Associated videoMEDIAurl:LondonBrighton.mp4mime-type:video/mp4start-time:2018-02-19T12:34:56.789Zpath:cam1NOTE Map configMAPlat:51.1618 lng:-0.1428rad:20000NOTE London overview00:00:01.000 -->{ "pan-to":  { "lat": 51.4952, "lng": -0.1441 }}00:00:02.000 -->{ "zoom":  { "rad": 10000 }}NOTE From London Victoria...00:00:03.000 -->{ "pan-to":  { "lat": 50.830553, "lng": -0.141706, "end": "00:00:25.000" }}{ "move-to":  { "lat": 51.494477, "lng": -0.144753, "path": "cam1" }}{ "line-to":  { "lat": 51.155958, "lng": -0.16089, "path": "cam1", "end": "00:00:10.000" }}NOTE ...via Gatwick Airport...00:00:10.000 -->{ "line-to":  { "lat": 50.830553, "lng": -0.141706, "path": "cam1", "end": "00:00:25.000" }}NOTE ...to Brighton (at 00:00:25.000)00:00:27.000 -->{ "zoom":  { "rad": 20000 }}

3.5.6Interpolated Annotation

Interpolation can also be applied to the attributes of aWebVMT command and a map annotation may be animated in this way.

This example tracks a drone with a circular 10-meter safety zone around it.

Example 20: Safe Drone WebVMT
WEBVMTNOTE Associated videoMEDIAurl:SafeDrone.mp4mime-type:video/mp4NOTE Map configMAPlat:51.0130 lng:-0.0015rad:1000NOTE Drone starts at (51.0130, -0.0015)00:00:05.000 -->{ "pan-to":  { "lat": 51.0070, "lng": -0.0020, "end": "00:00:25.000" }}{ "move-to":  { "lat": 51.0130, "lng": -0.0015, "path": "drone1" }}{ "line-to":  { "lat": 51.0090, "lng": -0.0017, "path": "drone1", "end": "00:00:10.000" }}NOTE Safety zone00:00:05.000 --> 00:00:10.000{ "circle":  { "lat": 51.0130, "lng": -0.0015, "rad": 10 }}{ "interp":  { "to": { "lat": 51.0090, "lng": -0.0017 } }}NOTE Drone arrives at (51.0090, -0.0017)00:00:10.000 -->{ "line-to":  { "lat": 51.0070, "lng": -0.0020, "path": "drone1", "end": "00:00:25.000" }}{ "circle":  { "lat": 51.0090, "lng": -0.0017, "rad": 10 }}{ "interp":  { "end": "00:00:25.000", "to": { "lat": 51.0070, "lng": -0.0020 } }}NOTE Drone ends at (51.0070, -0.0020)

3.6YouTube Integration

Embedded YouTube content can be displayed using an<iframe> element, specifying the unique 10-character content identifier for the posted video, using the officialYouTube IFrame API with the Javascript API enabled.

3.6.1Hello YouTube

A child<track> pseudo-element within the<iframe> links it with WebVMT using the same syntax as for the<video> DOM element.

Example 21: WebVMT YouTube HTML
<!doctype html><html>  <head>    <title>WebVMT YouTube Example</title>  </head>  <body>    <!-- Video display -->    <iframe src="http://www.youtube.com/embed/YOUTUBE_VIDEO_ID?enablejsapi=1" width="640" height="360" frameborder="0">      <track src="maptrack.vmt" kind="metadata" for="vmt-map" tileurl="mapbox://styles/mapbox/streets-v9">    </iframe>    <!-- Map display -->    <div></div>  </body></html>

Note that the<track> pseudo-element is actually replaced by the<iframe> content when the page is loaded.

Theurl in theMEDIA block should match thesrc attribute of the<iframe> element without the query.

Example 22: YouTube WebVMT Fragment
WEBVMTNOTE Associated YouTube videoMEDIAurl:http://www.youtube.com/embed/YOUTUBE_VIDEO_IDmime-type:video/mp4

4.Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

4.1Conformance classes

This specification describes the conformance criteria for user agents (relevant to implementors) andWebVMT files (relevant to authors and authoring tool implementors).

Note

Syntax defines what consists of a validWebVMT file. Authors need to follow the requirements therein, and are encouraged to use a conformance checker.Parsing defines how user agents are to interpret a file labelled astext/vmt, for both valid and invalidWebVMT files. The parsing rules are more tolerant to author errors than the syntax allows, in order to provide for extensibility and to still render cues that have some syntax errors.

Example 23: Conformance Error
For example, the parser will create two cues even if the blank line between them is skipped. This is clearly a mistake, so a conformance checker will flag it as an error, but it is still useful to render the cues to the user.

User agents fall into several (possibly overlapping) categories with different conformance requirements.

User agents that support scripting
All processing requirements in this specification apply. The user agent must also be conforming implementations of the IDL fragments in this specification, as described in theWeb IDL specification.
User agents with no scripting support
All processing requirements in this specification apply, except those inPlanned Interfaces.
User agents that do not support CSS
All processing requirements in this specification apply, except parts ofParsing that relate to stylesheets and CSS, and all ofCSS Extensions. The user agent must instead only render theWebVMT cue in an appropriate manner. Any other styling instructions are optional.
User agents that do not support a full HTML CSS engine
All processing requirements in this specification apply. However, the user agent will need to apply the CSS related features inParsing, andCSS Extensions in such a way that the rendered results are equivalent to what a full CSS supporting renderer produces.
User agents that support a full HTML CSS engine
All processing requirements in this specification apply. However, only a limited set of CSS styles is allowed becauseuser agents that do not support a full HTML CSS engine will need to implement CSS functionality equivalents. User agents that support a full CSS engine must therefore limit the CSS styles they apply for WebVMT so as to enable identical rendering without bleeding in extra CSS styles that are beyond the WebVMT specification.
Conformance checkers
Conformance checkers must verify that aWebVMT file conforms to the applicable conformance criteria described in this specification. The term "validator" is equivalent to conformance checker for the purpose of this specification.
Authoring tools
Authoring tools must generate conformingWebVMT files. Tools that convert other formats to WebVMT are also considered to be authoring tools.
When an authoring tool is used to edit a non-conformingWebVMT file, it may preserve the conformance errors in sections of the file that were not edited during the editing session (i.e. an editing tool is allowed to round-trip erroneous content). However, an authoring tool must not claim that the output is conformant if errors have been so preserved.

4.2Unicode normalization

Implementations of this specification must not normalize Unicode text during processing.

Example 24: Unicode Matching
For example, a cue with an identifier consisting of the characters U+0041 LATIN CAPITAL LETTER A followed by U+030A COMBINING RING ABOVE (a decomposed character sequence), or the character U+212B ANGSTROM SIGN (a compatibility character), will not match a selector targeting a cue with an ID consisting of the character U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE (a precomposed character).

5.Data Model

Editor's note

The data model of WebVMT consists of four key elements: the linked media file, the video viewport, cues, and the map viewport. The linked media file contains audio or video data with which cues are synchronized. The video viewport is the rendering area for video output. Cues are containers consisting of a set of metadata lines. The map viewport is the rendering area for metadata output, for example graphical annotations overlaid on an online map.

5.1Overview

This section is non-normative.

TheWebVMT file is a container file for chunks of data that are time-aligned with a video or audio resource. It can therefore be regarded as a serialisation format for time-aligned data.

A WebVMT file starts with a header and then contains a series of data blocks. If a data block has a start time, it is called a WebVMT cue. A comment is another kind of data block.

A WebVMT file carries cues which are identified asmetadata and specified in thekind attribute of thetrack element in theHTML specification.

The data kind of a WebVMT file is externally specified, such as in a HTML file’strack element. The environment is responsible for interpreting the data correctly.

AWebVMT cue is rendered as an overlay on top of the map viewport.

5.2WebVMT Cue

AWebVMT cue is atext track cue that additionally consists of the following:

Acue text
The raw text of the cue which is interpreted as time-aligned metadata, and rules for its interpretation.
Note: Cue End Time Omission

AWebVMT cue without an end time indicates that the cue is anunbounded text track cue, for example during live streaming when the time of the next data sample is unknown or when the duration of the media is unknown.

5.3WebVMT Location

AWebVMT location consists of:

Alocation latitude
The latitude in degrees of the location.
Alocation longitude
The longitude in degrees of the location.
Alocation altitude
Optionally, the altitude in meters of the location.
Note: Co-ordinate Reference System

Location information is provided in terms of World Geodetic System coordinates, WGS84. Altitude is measured in meters above the WGS84 ellipsoid, and should not be confused with the height above mean sea level.

5.4WebVMT Map

AWebVMT map is the map viewport and provides a rendering area forWebVMT cues.

AWebVMT map consists of:

Amap center location
TheWebVMT location at the center of the map.
Amap zoom radius
The radius in meters of the minimum area visible from themap center location.
Amap interface object
The control interface object for the map.
Note: Map Interface Object

The precise format of themap interface object is implementation dependent, for example OpenLayers API or Leaflet API.

For parsing, we also need the following:

Atext track map
AWebVMT map associated with the text track.
By default, thetext track map is set to null.

5.5WebVMT Media

AWebVMT media is metadata for the linked media with whichWebVMT cues are synchronized, for example audio or video.

Note: Search Engines

AWebVMT media enables a web crawler to rapidly search media metadata by providing sufficient information to construct a time-metadata index of the linked media file without opening it. Search engine data throughput is reduced as only matching media files selected by the user need be read, and non-matching media files are not accessed at all. Care should be taken to maintainWebVMT media details correctly, for example when a media file is renamed.

AWebVMT media consists of:

Amedia URL
The URL of the linked media file.
Note: Media URL

A nullmedia URL indicates that no linked media file exists.

Amedia MIME type
The MIME type of the linked media file.
Note: Media MIME Type

A nullmedia MIME type indicates that no linked media file exists.

Amedia start time
The global time and date at which the linked media file begins.
Note: Media Start Time

Themedia start time allows multipleWebVMT files to be aggregated. A nullmedia start time indicates that no start time is associated, for example in the case of an animation.

Amedia path
The path identifier which uniquely identifies the moving object capturing the linked media file.
Note: Media Path

A nullmedia path indicates that no moving object is associated, for example when no linked media file exists.

5.6WebVMT Command Structures

AWebVMT command is an instruction to display WebVMT metadata content.

AWebVMT command consists of one of the following components:

Note: Command Execution Order

WebVMT commands are executed in order from first to last in theWebVMT file.

5.6.1Map Controls

AWebVMT map control command controls map presentation.

AWebVMT map control command consists of one of the following components:

AWebVMT pan is a command to set the location of the map center.

AWebVMT pan consists of:

Apan location
TheWebVMT location to which themap center location pans.
Apan start time
The time at which the map starts panning towards thepan location.
Thepan start time is set to the cue start time.
Apan end time
The time at which themap center location equals thepan location.
Thepan end time may be defined as an absolute value, or calculated relative to thepan start time using a duration.

AWebVMT zoom is a command to set the level of detail of the map.

AWebVMT zoom consists of:

Azoom radius
The radius in meters of themap zoom radius.

5.6.2Zones

AWebVMT zone consists of all theWebVMT zone fragments with the samezone identifier.

AWebVMT zone fragment command consists of one of the following components:

AWebVMT circle is a command to annotate a circular area to the map.

AWebVMT circle consists of:

Azone identifier
The identifier shared by all theWebVMT zone fragments in theWebVMT zone.
By default, thezone identifier is set to null.
Acircle location
TheWebVMT location of the circle center.
Acircle radius
The radius in meters of the circle.

AWebVMT polygon is a command to annotate a polygonal area to the map.

AWebVMT polygon consists of:

Azone identifier
The identifier shared by all theWebVMT zone fragments in theWebVMT zone.
By default, thezone identifier is set to null.
A list ofWebVMT locations defining the polygon vertices.
Vertex locations are listed sequentially around the perimeter of the polygon. The last vertex should not repeat the value of the first, as this is implicit.

5.6.3Paths

AWebVMT path consists of all thepath segments with the samepath identifier.

Apath segment consists of a sequence of contiguousWebVMT path fragments that describe the trajectory of an object moving through the mapped space.

Note: Non-Contiguous Segments

AWebVMT path may include non-contiguouspath segments, but eachpath segment must contain a sequence ofcontiguousWebVMT path fragments.

Apath segment consists of the following components, in the order given:

  1. OneWebVMT move command;
  2. Zero or moreWebVMT line commands.

AWebVMT path fragment command consists of the one of the following components:

AWebVMT move command sets the start location of the firstWebVMT path fragment in apath segment.

AWebVMT move consists of:

Apath identifier
The identifier shared by all thepath segments in theWebVMT path.
By default, thepath identifier is set to null.
Afragment start time
The time at which theWebVMT path fragment starts.
Thefragment start time is set to the cue start time.
Afragment start location
TheWebVMT location at thefragment start time.

AWebVMT line command sets the end location of theWebVMT path fragment. Thefragment start location is set by the precedingWebVMT path fragment in theWebVMT path.

AWebVMT line consists of:

Apath identifier
The identifier shared by all thepath segments in theWebVMT path.
By default, thepath identifier is set to null.
Afragment end time
The time at which theWebVMT path fragment ends.
By default, thefragment end time is set to the cue end time.
Thefragment end time may be defined as an absolute value, or calculated relative to thefragment start time using a duration.
Afragment end location
TheWebVMT location at thefragment end time.
Note: Path Interpolation

AWebVMT line is a straight line from the start location to the end location. The location of the moving object can be linearly interpolated between thefragment start time and thefragment end time.

5.6.4Synchronized Data

AWebVMT synchronized data synchronizes a sample from a data source with aWebVMT cue.

AWebVMT synchronized data command consists of:

Asynchronized data object
An arbitrary object representing the raw sample from the data source.
Asynchronized data type
An associated data type, e.g. org.geojson.
Optionally, asynchronized data identifier
An identifier shared by all samples from the same data source over time, e.g. for interpolation.
Optionally, asynchronized path identifier
Apath identifier associated with the data source, e.g. a moving sensor.

5.6.5Attribute Interpolation

AWebVMT interpolation changes an object attribute from a start value to an end value over a time interval.

AWebVMT interpolation consists of:

Aninterpolation object
The parent object of the attribute.
Aninterpolation attribute
The name of the attribute to change.
Aninterpolation start time
The time at which the interpolation starts.
By default, theinterpolation start time is set to the cue start time.
Theinterpolation start time may be defined as an absolute value, or calculated relative to the cue start time using a delay.
Aninterpolation start value
The initial value of the attribute.
Theinterpolation start value is set by the value of theinterpolation attribute at theinterpolation start time.
Aninterpolation end time
The time at which the interpolation ends.
By default, theinterpolation end time is set to the cue end time.
Theinterpolation end time may be defined as an absolute value, or calculated relative to the interpolation start time using a duration.
Aninterpolation end value
The final value of the attribute.

AWebVMT interpolation list consists of one or moreWebVMT interpolations with all interpolation objects set to the precedingWebVMT command.

6.Syntax

6.1WebVMT File Structure

AWebVMT file must consist of aWebVMT file body encoded asUTF-8 and labeled with theMIME typetext/vmt.

AWebVMT file body consists of the following components, in the order given:

  1. An optional U+FEFF BYTE ORDER MARK (BOM) character.
  2. The string "WEBVMT" (U+0057 LATIN CAPITAL LETTER W, U+0045 LATIN CAPITAL LETTER E, U+0042 LATIN CAPITAL LETTER B, U+0056 LATIN CAPITAL LETTER V, U+004D LATIN CAPITAL LETTER M, U+0054 LATIN CAPITAL LETTER T).
  3. Optionally, either a U+0020 SPACE character or a U+0009 CHARACTER TABULATION (tab) character followed by any number of characters that are not U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR) characters.
  4. Two or moreWebVMT line terminators to terminate the line with the file magic and separate it from the rest of the body.
  5. The following components, in any order, separated from each other by one or moreWebVMT line terminators.
  6. Zero or moreWebVMT line terminators.
  7. Zero or moreWebVMT cue blocks andWebVMT comment blocks separated from each other by one or moreWebVMT line terminators.
  8. Zero or moreWebVMT line terminators.

AWebVMT line terminator consists of one of the following:

AWebVMT media metadata block consists of the following components, in the order given:

  1. The string "MEDIA" (U+004D LATIN CAPITAL LETTER M, U+0045 LATIN CAPITAL LETTER E, U+0044 LATIN CAPITAL LETTER D, U+0049 LATIN CAPITAL LETTER I, U+0041 LATIN CAPITAL LETTER A).
  2. Zero or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
  3. AWebVMT line terminator.
  4. AWebVMT media settings list.
  5. AWebVMT line terminator.
Note: Media Metadata

TheWebVMT media metadata block provides hints about the linked media file for web crawlers and search engines.

AWebVMT map initialisation block consists of the following components, in the order given:

  1. The string "MAP" (U+004D LATIN CAPITAL LETTER M, U+0041 LATIN CAPITAL LETTER A, U+0050 LATIN CAPITAL LETTER P).
  2. Zero or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
  3. AWebVMT line terminator.
  4. AWebVMT map settings list.
  5. AWebVMT line terminator.
Note: Map Initialisation

TheWebVMT map initialisation block defines the state of theWebVMT map before anyWebVMT cues are active.

AWebVMT style block consists of the following components, in the order given:

  1. The string "STYLE" (U+0053 LATIN CAPITAL LETTER S, U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL LETTER Y, U+004C LATIN CAPITAL LETTER L, U+0045 LATIN CAPITAL LETTER E).
  2. Zero or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
  3. AWebVMT line terminator.
  4. Any sequence of zero or more characters other than U+000A LINE FEED (LF) characters and U+000D CARRIAGE RETURN (CR) characters, each optionally separated from the next by aWebVMT line terminator, except that the entire resulting string must not contain the substring "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN). The string represents a CSS style sheet; the requirements given in the relevantCSS specifications apply.
  5. AWebVMT line terminator.

AWebVMT cue block consists of the following components, in the order given:

  1. Optionally, aWebVMT cue identifier followed by aWebVMT line terminator.
  2. WebVMT cue timings.
  3. Zero or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
  4. AWebVMT line terminator.
  5. TheWebVMT cue payload consists of aWebVMT metadata text, but must not contain the substring "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).
  6. AWebVMT line terminator.
Note: Cues

AWebVMT cue block corresponds to one piece of time-aligned data in theWebVMT file. TheWebVMT cue payload is the data associated with theWebVMT cue.

AWebVMT cue identifier is any sequence of one or more characters not containing the substring "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), nor containing any U+000A LINE FEED (LF) characters or U+000D CARRIAGE RETURN (CR) characters.

AWebVMT cue identifier must be unique amongst all theWebVMT cue identifiers of allWebVMT cues of aWebVMT file.

Note: Cue Identifers

AWebVMT cue identifier can be used to identify a specific cue, for example from script or CSS.

TheWebVMT cue timings part of aWebVMT cue block consists of the following components, in the order given:

  1. AWebVMT timestamp representing the start time offset of the cue. The time represented by thisWebVMT timestamp must be greater than or equal to the start time offsets of all previous cues in the file.
  2. One or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
  3. The string "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).
  4. One or more U+0020 SPACE characters or U+0009 CHARACTER TABULATION (tab) characters.
  5. Optionally, aWebVMT timestamp representing the end time offset of the cue. The time represented by thisWebVMT timestamp must be greater than or equal to the start time offset of the cue.
Note: Cue Timings

TheWebVMT cue timings give the start and end offsets of theWebVMT cue block. Different cues can overlap. Cues are always listed ordered by their start time.

AWebVMT timestamp consists of the following components, in the order given:

  1. Optionally (required ifhours is non-zero):
    1. Two or moreASCII digits, representing thehours as a base ten integer.
    2. A U+003A COLON character (:).
  2. TwoASCII digits, representing theminutes as a base ten integer in the range 0 ≤minutes ≤ 59.
  3. A U+003A COLON character (:).
  4. TwoASCII digits, representing theseconds as a base ten integer in the range 0 ≤seconds ≤ 59.
  5. A U+002E FULL STOP character (.).
  6. ThreeASCII digits, representing the thousandths of a secondseconds-frac as a base ten integer.
Note: Timestamp Interpretation

AWebVMT timestamp is always interpreted relative to thecurrent playback position of the media data with which theWebVMT file is to be synchronized.

AWebVMT comment block consists of the following components, in the order given:

  1. The string "NOTE" (U+004E LATIN CAPITAL LETTER N, U+004F LATIN CAPITAL LETTER O, U+0054 LATIN CAPITAL LETTER T, U+0045 LATIN CAPITAL LETTER E).
  2. Optionally, the following components, in the order given:
    1. Either:
    2. Any sequence of zero or more characters other than U+000A LINE FEED (LF) characters and U+000D CARRIAGE RETURN (CR) characters, each optionally separated from the next by aWebVMT line terminator, except that the entire resulting string must not contain the substring "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).
  3. AWebVMT line terminator.
Note: Comment Parsing

AWebVMT comment block is ignored by the parser.

6.2WebVMT Cue Payload

WebVMT metadata text consists of any sequence of zero or more characters other than U+000A LINE FEED (LF) characters and U+000D CARRIAGE RETURN (CR) characters, each optionally separated from the next by aWebVMT line terminator. (In other words, any text that does not have two consecutiveWebVMT line terminators and does not start or end with aWebVMT line terminator.)

The string represents aWebVMT command list.

WebVMT metadata text cues are only useful for scripted applications (e.g. using themetadatatext track kind in a HTMLtext track).

6.3WebVMT Media Settings

TheWebVMT media settings list consists of zero or more of the following components, in any order, separated from each other by one or more U+0020 SPACE characters, U+0009 CHARACTER TABULATION (tab) characters, orWebVMT line terminators, except that the string must not contain two consecutiveWebVMT line terminators. Each component must not be included more than once perWebVMT media settings list string.

AWebVMT media url setting consists of the following components, in the order given:

  1. The string "url".
  2. A U+003A COLON character (:).
  3. Avalid URL.
Note: URL Resolution

For the purpose of resolving aURL in theMEDIA block of a WebVMT file, or any URLs in resources referenced fromMEDIA blocks of a WebVMT file, if the URL’s scheme is not "data", then the user agent must act as if the URL failed to resolve. If theurl value does not match thesrc attribute of the HTML<track> element, then thesrc value takes precedence.

AWebVMT media MIME type setting consists of the following components, in the order given:

  1. The string "mime-type".
  2. A U+003A COLON character (:).
  3. A validMIME type.

AWebVMT media start time setting consists of the following components, in the order given:

  1. The string "start-time".
  2. A U+003A COLON character (:).
  3. Avalid global date and time string.
Note: Media Start Time

WebVMT media start time setting should include millisecond data in order to allow theWebVMT file to be accurately synchronized with Coordinated Universal Time (UTC).

AWebVMT media path setting consists of the following components, in the order given:

  1. The string "path".
  2. A U+003A COLON character (:).
  3. AWebVMT path identifier.

6.4WebVMT Map Settings

TheWebVMT map settings list consists of the following components, in any order, separated from each other by one or more U+0020 SPACE characters, U+0009 CHARACTER TABULATION (tab) characters, orWebVMT line terminators, except that the string must not contain two consecutiveWebVMT line terminators. Each component must be included once perWebVMT map settings list string.

Note: Initial Map State

TheWebVMT map settings list defines theWebVMT map state before the first cue is active.

AWebVMT map center latitude setting consists of aWebVMT latitude setting.

AWebVMT map center longitude setting consists of aWebVMT longitude setting.

AWebVMT map center altitude setting consists of aWebVMT altitude setting.

Note: Map Center Location

When interpreted as numbers, theWebVMT map center latitude setting,WebVMT map center longitude setting andWebVMT map center altitude setting values represent themap center location.

AWebVMT latitude setting consists of the following components, in the order given:

  1. The string "lat".
  2. A U+003A COLON character (:).
  3. AWebVMT latitude.

AWebVMT latitude consists of the following components, in the order given:

  1. Optionally, a U+002D HYPHEN-MINUS character (-).
  2. One or moreASCII digits.
  3. Optionally:
    1. A U+002E DOT character (.).
    2. One or moreASCII digits.
Note: Latitude Range

When interpreted as a number, aWebVMT latitude must be in the range -90..+90.

AWebVMT longitude setting consists of the following components, in the order given:

  1. The string "lng".
  2. A U+003A COLON character (:).
  3. AWebVMT longitude.

AWebVMT longitude consists of the following components, in the order given:

  1. Optionally, a U+002D HYPHEN-MINUS character (-).
  2. One or moreASCII digits.
  3. Optionally:
    1. A U+002E DOT character (.).
    2. One or moreASCII digits.
Note: Longitude Range

When interpreted as a number, aWebVMT longitude must be in the range -180..+180.

AWebVMT altitude setting consists of the following components, in the order given:

  1. The string "alt".
  2. A U+003A COLON character (:).
  3. AWebVMT altitude.

AWebVMT altitude consists of the following components, in the order given:

  1. Optionally, a U+002D HYPHEN-MINUS character (-).
  2. One or moreASCII digits.
  3. Optionally:
    1. A U+002E DOT character (.).
    2. One or moreASCII digits.
Note: Altitude Zero

When interpreted as a number, aWebVMT altitude represents the height in meters above the WGS84 ellipsoid. Care should be taken not to confuse this with the height above mean sea level.

AWebVMT map zoom setting consists of the following components, in the order given:

  1. The string "rad".
  2. A U+003A COLON character (:).
  3. One or moreASCII digits.
  4. Optionally:
    1. A U+002E DOT character (.).
    2. One or moreASCII digits.
Note: Zoom Radius

When interpreted as a number, theWebVMT map zoom setting must be positive and represents themap zoom radius.

6.5WebVMT Commands

AWebVMT command list consists of one or more of the following components in any order, separated from each other by aWebVMT line terminator:

6.5.1WebVMT Map Commands

AWebVMT map control command consists of one of the following components:

AWebVMT pan command consists of aJSON text representing the followingJSON object:

AWebVMT pan parameter list is aJSON object representing the following components in any order:

AWebVMT pan latitude attribute consists of aWebVMT latitude attribute.

AWebVMT pan longitude attribute consists of aWebVMT longitude attribute.

AWebVMT pan altitude attribute consists of aWebVMT altitude attribute.

AWebVMT pan end time attribute consists of aWebVMT end time attribute.

AWebVMT pan duration attribute consists of aWebVMT duration attribute.

AWebVMT zoom command consists of aJSON text representing the followingJSON object:

AWebVMT zoom parameter list is aJSON object representing the following component:

AWebVMT zoom radius attribute consists of aWebVMT radius attribute.

Note: Zoom Radius

When interpreted as a number, theWebVMT zoom radius attribute value represents themap zoom radius.

AWebVMT radius attribute consists of aJSON text consisting of the following components in the order given:

  1. TheJSON string "rad".
  2. A U+003A COLON character (:).
  3. AJSON value consisting of aJSON number greater than zero.

6.5.2WebVMT Zone Commands

AWebVMT zone annotation command consists of one of the following components:

AWebVMT circle command consists of aJSON text representing the followingJSON object:

AWebVMT circle parameter list consists of aJSON object representing the following components in any order:

AWebVMT circle center latitude attribute consists of aWebVMT latitude attribute.

AWebVMT circle center longitude attribute consists of aWebVMT longitude attribute.

AWebVMT circle center altitude attribute consists of aWebVMT altitude attribute.

AWebVMT circle radius attribute consists of aWebVMT radius attribute.

AWebVMT zone attribute consists of aJSON text consisting of the following components in the order given:

  1. TheJSON string "zone".
  2. A U+003A COLON character (:).
  3. AJSON value consisting of aJSON string representing aWebVMT zone identifier.

AWebVMT zone identifier is any sequence of one or more characters not containing the substring "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), nor containing any U+000A LINE FEED (LF) characters or U+000D CARRIAGE RETURN (CR) characters.

Note: Zone Identification

AWebVMT zone identifier is a string which uniquely identifies a zone in theWebVMT file, for example a safety zone around a moving object.

AWebVMT polygon command consists of aJSON text representing the followingJSON object:

AWebVMT polygon parameter list consists of the followingJSON object:

AWebVMT zone perimeter list consists of the followingJSON object:

AWebVMT vertices list consists of aJSON array of three or moreJSON objects each representing aWebVMT location attribute list.

AWebVMT location attribute list consists of aJSON text representing a list of the followingJSON values in any order, separated from each other by a U+002C COMMA character (,):

AWebVMT latitude attribute consists of aJSON text consisting of the following components in the order given:

  1. TheJSON string "lat".
  2. A U+003A COLON character (:).
  3. AJSON value consisting of aJSON number.
Note: Latitude Range

When interpreted as a number, aWebVMT latitude attribute must be in the range -90..+90.

AWebVMT longitude attribute consists of aJSON text consisting of the following components in the order given:

  1. TheJSON string "lng".
  2. A U+003A COLON character (:).
  3. AJSON value consisting of aJSON number.
Note: Longitude Range

When interpreted as a number, aWebVMT longitude attribute must be in the range -180..+180.

AWebVMT altitude attribute consists of aJSON text consisting of the following components in the order given:

  1. TheJSON string "alt".
  2. A U+003A COLON character (:).
  3. AJSON value consisting of aJSON number.
Note: Altitude Zero

When interpreted as a number, aWebVMT altitude represents the height in meters above the WGS84 ellipsoid. Care should be taken not to confuse this with the height above mean sea level.

6.5.3WebVMT Path Commands

AWebVMT path annotation command consists of one of the following components:

AWebVMT move command consists of aJSON text representing the followingJSON object:

AWebVMT move parameter list is aJSON object representing the following components in any order:

AWebVMT fragment start latitude attribute consists of aWebVMT latitude attribute.

AWebVMT fragment start longitude attribute consists of aWebVMT longitude attribute.

AWebVMT fragment start altitude attribute consists of aWebVMT altitude attribute.

AWebVMT path attribute consists of aJSON text consisting of the following components in the order given:

  1. TheJSON string "path".
  2. A U+003A COLON character (:).
  3. AJSON value consisting of aJSON string representing aWebVMT path identifier.

AWebVMT path identifier is any sequence of one or more characters not containing the substring "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), nor containing any U+000A LINE FEED (LF) characters or U+000D CARRIAGE RETURN (CR) characters.

Note: Path Identification

AWebVMT path identifier is a string which uniquely identifies a moving object in theWebVMT file, for example a camera.

AWebVMT line command consists of aJSON text representing the followingJSON object:

AWebVMT line parameter list consists of aJSON object representing the following components in any order:

AWebVMT fragment end latitude attribute consists of aWebVMT latitude attribute.

AWebVMT fragment end longitude attribute consists of aWebVMT longitude attribute.

AWebVMT fragment end altitude attribute consists of aWebVMT altitude attribute.

AWebVMT fragment end time attribute consisting of aWebVMT end time attribute.

AWebVMT fragment duration attribute consisting of aWebVMT duration attribute.

6.5.4WebVMT Synchronization Command

AWebVMT synchronized data command consists of aJSON text representing the followingJSON object:

AWebVMT synchronized parameter list consists of aJSON object representing the following components in any order:

AWebVMT synchronized type attribute consists of aJSON text consisting of the following components in the order given:

  1. TheJSON string "type".
  2. A U+003A COLON character (:).
  3. AJSON string representing asynchronized data type.

AWebVMT synchronized data attribute consists of aJSON text consisting of the following components in the order given:

  1. TheJSON string "data".
  2. A U+003A COLON character (:).
  3. AJSON object representing asynchronized data object.

AWebVMT synchronized identifier attribute consists of aJSON text consisting of the following components in the order given:

  1. TheJSON string "id".
  2. A U+003A COLON character (:).
  3. AJSON string representing asynchronized data identifier.

AWebVMT synchronized path attribute consists of aWebVMT path attribute representing asynchronized path identifier.

6.5.5WebVMT Interpolation Subcommand

AWebVMT interpolation subcommand consists of aJSON text representing the followingJSON object:

Note: Parent Command

TheWebVMT interpolation subcommand refers to the attributes of its parent command. The parent command is theinterpolation object.

AWebVMT interpolation parameter list consists of aJSON object consisting of the following components in any order:

AWebVMT interpolation target attribute consists of aJSON text consisting of the following components in the order given:

  1. TheJSON string "to".
  2. A U+003A COLON character (:).
  3. AJSON object representing aWebVMT interpolation target parameter list.

AWebVMT interpolation target parameter list consists of aJSON object representing theinterpolation attributes set tointerpolation end values.

Note: Interpolation Target Omissions

Attributes of theinterpolation object omitted from aWebVMT interpolation target parameter list are not affected by that subcommand.

AWebVMT end time attribute consists of aJSON text consisting of the following components in the order given:

  1. TheJSON string "end".
  2. A U+003A COLON character (:).
  3. AJSON string representing aWebVMT timestamp.

By default, theWebVMT end time attribute is set to theWebVMT cue end time value.

Note: End Time

AWebVMT end time attribute represents the time at which a process ends.

AWebVMT duration attribute consists of aJSON text consisting of the following components in the order given:

  1. TheJSON string "dur".
  2. A U+003A COLON character (:).
  3. AJSON string representing aWebVMT timespan.
Note: Duration

AWebVMT duration attribute represents the time interval for which a process lasts and supersedes the default value of theWebVMT end time attribute.

AWebVMT timespan is the positive time offset between twoWebVMT timestamps and is represented inWebVMT timestamp format.

6.6Properties Of Cue Sequences

6.6.1WebVMT File Using Only Nested Cues

AWebVMT file whose cues all comply with the following rule is said to be aWebVMT file using only nested cues.

Given any two cuescue1 andcue2 with start and end time offsets(x1, y1) and(x2, y2) respectively:

  • eithercue1 lies fully withincue2, i.e.x1 >= x2 andy1 <= y2;
  • orcue1 fully containscue2, i.e.x1 <= x2 andy1 >= y2.

The following example matches this definition:

Example 25: Nested Cues
WEBVMTNOTE Required blocks omitted for clarity00:00.000 --> 01:24.000{ "circle": { "lat": 0, "lng": 0, "rad": 2000 } }00:00.000 --> 00:44.000{ "move-to": { "lat": 0, "lng": 0, "path": "cam1" } }{ "line-to": { "lat": 0.12, "lng": 0.34, "path": "cam1" } }00:44.000 --> 01:19.000{ "line-to": { "lat": 0.56, "lng": 0.78, "path": "cam1" } }01:24.000 --> 05:00.000{ "circle": { "lat": 0, "lng": 0, "rad": 30000 } }01:35.000 --> 03:00.000{ "move-to": { "lat": 0.87, "lng": 0.65, "path": "cam2" } }{ "line-to": { "lat": 0.43, "lng": 0.21, "path": "cam2" } }03:00.000 --> 05:00.000{ "line-to": { "lat": 0, "lng": 0, "path": "cam2" } }

Notice how you can express the cues in this WebVMT file as a tree structure:

  • 2km Circle at (0, 0)
    • Line (0, 0) to (0.12, 0.34)
    • Line (0.12, 0.34) to (0.56, 0.78)
  • 30km Circle at (0, 0)
    • Line (0.87, 0.65) to (0.43, 0.21)
    • Line (0.43, 0.21) to (0, 0)

If the file has cues that can’t be expressed in this fashion, then they don’t match the definition of aWebVMT file using only nested cues. For example:

Example 26: Non-Nested Cues
WEBVMTNOTE Required blocks omitted for clarity00:00.000 --> 01:00.000{ "move-to": { "lat": 0.12, "lng": 0.34, "path": "cam3" } }{ "line-to": { "lat": 0.56, "lng": 0.78, "path": "cam3" } }00:30.000 --> 01:30.000{ "move-to": { "lat": 0.87, "lng": 0.65, "path": "cam4" } }{ "line-to": { "lat": 0.43, "lng": 0.21, "path": "cam4" } }

In this ninety-second example, the two cues partly overlap, with the first ending before the second ends and the second starting before the first ends. This therefore is not aWebVMT file using only nested cues.

7.Parsing

WebVMT file parsing is similar toWebVTT parsing, though many of those steps can be skipped as WebVMT files are metadata files.

7.1WebVMT File Parsing

AWebVMT parser, given an input byte stream, atext track list of cuesoutput, and a collection ofCSS style sheetsstylesheets, must decode the byte stream using the UTF-8 decode algorithm, and then must parse the resulting string according to theWebVMT parser algorithm. This results inWebVMT cues being added tooutput, andCSS style sheets being added tostylesheets.

AWebVMT parser, specifically its conversion and parsing steps, is typically run asynchronously, with the input byte stream being updated incrementally as the resource is downloaded; this is called anincremental WebVMT parser.

AWebVMT parser verifies a file signature before parsing the provided byte stream. If the stream lacks this WebVMT file signature, then the parser aborts.

TheWebVMT parser algorithm is as follows:

  1. Letinput be the string being parsed, after conversion to Unicode, and with the following transformations applied:
    • Replace all U+0000 NULL characters by U+FFFD REPLACEMENT CHARACTERs.
    • Replace each U+000D CARRIAGE RETURN U+000A LINE FEED (CRLF) character pair by a single U+000A LINE FEED (LF) character.
    • Replace all remaining U+000D CARRIAGE RETURN characters by U+000A LINE FEED (LF) characters.
  2. Letposition be a pointer intoinput, initially pointing at the start of the string. In anincremental WebVMT parser, when this algorithm (or further algorithms that it uses) moves theposition pointer, the user agent must wait until appropriate further characters from the byte stream have been added toinput before moving the pointer, so that the algorithm never reads past the end of theinput string. Once the byte stream has ended, and all characters have been added toinput, then theposition pointer may, when so instructed by the algorithms, be moved past the end ofinput.
  3. Letseen cue be false.
  4. Ifinput is less than six characters long, then abort these steps. The file does not start with the correctWebVMT file signature and was therefore not successfully processed.
  5. Ifinput is exactly six characters long but does not exactly equal "WEBVMT", then abort these steps. The file does not start with the correctWebVMT file signature and was therefore not successfully processed.
  6. Ifinput is more than six characters long but the first six characters do not exactly equal "WEBVMT", or the seventh character is not a U+0020 SPACE character, a U+0009 CHARACTER TABULATION (tab) character, or a U+000A LINE FEED (LF) character, then abort these steps. The file does not start with the correctWebVMT file signature and was therefore not successfully processed.
  7. Collect a sequence of code points that arenot U+000A LINE FEED (LF) characters.
  8. Ifposition is past the end ofinput, then abort these steps. The file was successfully processed, but it contains no useful data and so noWebVMT cues were added tooutput.
  9. The character indicated byposition is a U+000A LINE FEED (LF) character. Advanceposition to the next character ininput.
  10. Ifposition is past the end ofinput, then abort these steps. The file was successfully processed, but it contains no useful data and so noWebVMT cues were added to output.
  11. Header: If the character indicated byposition is not a U+000A LINE FEED (LF) character, thencollect a WebVMT block with thein header flag set. Otherwise, advanceposition to the next character ininput.
  12. Collect a sequence of code points that are U+000A LINE FEED (LF) characters.
  13. Letmap be null.
  14. Letmedia metadata be null.
  15. Block loop: Whileposition doesn’t point past the end ofinput:
    1. Collect a WebVMT block, and letblock be the returned value.
    2. Ifblock is aWebVMT cue, addblock to thetext track list of cuesoutput.
    3. Otherwise, ifblock is aCSS style sheet, addblock tostylesheets.
    4. Otherwise, ifblock is aWebVMT map object, letmap beblock.
    5. Otherwise, ifblock is aWebVMT media object, letmedia metadata beblock.
    6. Collect a sequence of code points that are U+000A LINE FEED (LF) characters.
  16. End: The file has ended. Abort these steps. TheWebVMT parser has finished. The file was successfully processed.

When the algorithm above says tocollect a WebVMT block, optionally with a flagin header set, the user agent must run the following steps:

  1. Letinput,position andseen cue be the same variables as those of the same name in the algorithm that invoked these steps.
  2. Letline count be zero.
  3. Letprevious position beposition.
  4. Letline be the empty string.
  5. Letbuffer be the empty string.
  6. Letseen EOF be false.
  7. Letseen arrow be false.
  8. Letcue be null.
  9. Letstylesheet be null.
  10. Letmap be null.
  11. Letmedia metadata be null.
  12. Loop: Run these substeps in a loop:
    1. Collect a sequence of code points that arenot U+000A LINE FEED (LF) characters. Letline be those characters, if any.
    2. Incrementline count by 1.
    3. Ifposition is past the end ofinput, letseen EOF be true. Otherwise, the character indicated byposition is a U+000A LINE FEED (LF) character; advanceposition to the next character ininput.
    4. Ifline contains the three-character substring "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), then run these substeps:
      1. Ifin header is not set and at least one of the following conditions are true:
        • line count is 1
        • line count is 2 andseen arrow is false
        ...then run these substeps:
        1. Letseen arrow be true.
        2. Letprevious position beposition.
        3. Cue creation: Letcue be a newWebVMT cue and initialize it as follows:
          1. Letcue'stext track cue identifier bebuffer.
          2. Letcue'stext track cue pause-on-exit flag be false.
          3. Letcue'scue text be the empty string.
        4. Collect WebVMT cue timings fromline forcue. If that fails, letcue be null. Otherwise, letbuffer be the empty string and letseen cue be true.
        Otherwise, letposition beprevious position and break out ofloop.
    5. Otherwise, ifline is the empty string, break out ofloop.
    6. Otherwise, run these substeps:
      1. Ifin header is not set andline count is 2, run these substeps:
        1. Ifseen cue is false andbuffer starts with the substring "STYLE" (U+0053 LATIN CAPITAL LETTER S, U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL LETTER Y, U+004C LATIN CAPITAL LETTER L, U+0045 LATIN CAPITAL LETTER E), and the remaining characters inbuffer (if any) are allASCII whitespace, then run these substeps:
          1. Letstylesheet be the result ofcreating a CSS style sheet, with the following properties:

            location
            null
            parent CSS style sheet
            null
            owner node
            null
            owner CSS rule
            null
            media
            The empty string.
            title
            The empty string.
            alternate flag
            Unset.
            origin-clean flag
            Set.
          2. Letbuffer be the empty string.
        2. Otherwise, ifseen cue is false andbuffer starts with the substring "MAP" (U+004D LATIN CAPITAL LETTER M, U+0041 LATIN CAPITAL LETTER A, U+0050 LATIN CAPITAL LETTER P), and the remaining characters inbuffer (if any) are allASCII whitespace, then run these substeps:
          1. Map creation: Letmap be a newWebVMT map.
          2. Letbuffer be the empty string.
        3. Otherwise, ifseen cue is false andbuffer starts with the substring "MEDIA" (U+004D LATIN CAPITAL LETTER M, U+0045 LATIN CAPITAL LETTER E, U+0044 LATIN CAPITAL LETTER D, U+0049 LATIN CAPITAL LETTER I, U+0041 LATIN CAPITAL LETTER A), and the remaining characters inbuffer (if any) are allASCII whitespace, then run these substeps:
          1. Media creation: Letmedia metadata be a newWebVMT media.
          2. Letbuffer be the empty string.
      2. Ifbuffer is not the empty string, append a U+000A LINE FEED (LF) character tobuffer.
      3. Appendline tobuffer.
      4. Letprevious position beposition.
    7. Ifseen EOF is true, break out ofloop.
  13. Ifcue is not null, let thecue text ofcue bebuffer, and returncue.
  14. Otherwise, ifstylesheet is not null, thenparse a stylesheet frombuffer. If it returned a list of rules, assign the list asstylesheet'sCSS rules; otherwise, setstylesheet'sCSS rules to an empty list. Finally, returnstylesheet.
  15. Otherwise, ifmap is not null, thencollect WebVMT map settings frombuffer usingmap for the results. Construct aWebVMT map object frommap, and return it.
  16. Otherwise, ifmedia metadata is not null, thencollect WebVMT media settings frombuffer usingmedia metadata for the results. Construct aWebVMT media object frommedia metadata, and return it.
  17. Otherwise, return null.

7.2WebVMT Map Settings Parsing

When theWebVMT parser algorithm says tocollect WebVMT map settings from a stringinput for atext track, the user agent must run the following algorithm.

AWebVMT map object is a conceptual construct to represent aWebVMT map that is used as a root node forWebVMT node objects. This algorithm returns aWebVMT map object.

  1. Letsettings be the result ofsplittinginput on spaces.
  2. For each tokensetting in the listsettings, run the following substeps:
    1. Ifsetting does not contain a U+003A COLON character (:), or if the first U+003A COLON character (:) insetting is either the first or last character ofsetting, then jump to the step labelednext setting.
    2. Letname be the leading substring ofsetting up to and excluding the first U+003A COLON character (:) in that string.
    3. Letvalue be the trailing substring ofsetting starting from the character immediately after the first U+003A COLON character (:) in that string.
    4. Run the appropriate substeps that apply for the value ofname, as follows:
      Ifname is a case-sensitive match for "lat"
      Letmap'slocation latitude of themap center location bevalue.
      Otherwise ifname is a case-sensitive match for "lng"
      Letmap'slocation longitude of themap center location bevalue.
      Otherwise ifname is a case-sensitive match for "alt"
      Letmap'slocation altitude of themap center location bevalue.
      Otherwise ifname is a case-sensitive match for "rad"
      Letmap'smap zoom radius bevalue.
    5. Next setting: Continue to the next setting, if any.

7.3WebVMT Media Settings Parsing

When theWebVMT parser algorithm says tocollect WebVMT media settings from a stringinput for atext track, the user agent must run the following algorithm.

AWebVMT media object is a conceptual construct to represent aWebVMT media. This algorithm returns aWebVMT media object.

  1. Letsettings be the result ofsplittinginput on spaces.
  2. For each tokensetting in the listsettings, run the following substeps:
    1. Ifsetting does not contain a U+003A COLON character (:), or if the first U+003A COLON character (:) insetting is either the first or last character ofsetting, then jump to the step labelednext setting.
    2. Letname be the leading substring ofsetting up to and excluding the first U+003A COLON character (:) in that string.
    3. Letvalue be the trailing substring ofsetting starting from the character immediately after the first U+003A COLON character (:) in that string.
    4. Run the appropriate substeps that apply for the value ofname, as follows:
      Ifname is a case-sensitive match for "url"
      Letmedia metadata'smedia URL bevalue.
      Otherwise ifname is a case-sensitive match for "mime-type"
      Letmedia metadata'smedia MIME type bevalue.
      Otherwise ifname is a case-sensitive match for "start-time"
      Letmedia metadata'smedia start time bevalue.
      Otherwise ifname is a case-sensitive match for "path"
      Letmedia metadata'smedia path bevalue.
    5. Next setting: Continue to the next setting, if any.

7.4WebVMT Cue Timings Parsing

When the algorithm above says tocollect WebVMT cue timings from a stringinput for aWebVMT cuecue, the user agent must run the following algorithm.

  1. Letinput be the string being parsed.
  2. Letposition be a pointer intoinput, initially pointing at the start of the string.
  3. Skip whitespace.
  4. Collect a WebVMT timestamp. If that algorithm fails, then abort these steps and return failure. Otherwise, letcue'stext track cue start time be the collected time.
  5. Skip whitespace.
  6. If the character atposition is not a U+002D HYPHEN-MINUS character (-) then abort these steps and return failure. Otherwise, moveposition forwards one character.
  7. If the character atposition is not a U+002D HYPHEN-MINUS character (-) then abort these steps and return failure. Otherwise, moveposition forwards one character.
  8. If the character atposition is not a U+003E GREATER-THAN SIGN character (>) then abort these steps and return failure. Otherwise, moveposition forwards one character.
  9. Skip whitespace.
  10. Ifposition is not past the end ofinput and the character atposition is anASCII digit,collect a WebVMT timestamp. If that algorithm fails, then abort these steps and return failure. Otherwise, letcue'stext track cue end time be the collected time.
  11. Otherwise (position is past the end ofinput or the character atposition is not anASCII digit), letcue'stext track cue end time be the value positive Infinity.

When this specification says that a user agent is tocollect a WebVMT timestamp, the user agent must run the following steps:

  1. Letinput andposition be the same variables as those of the same name in the algorithm that invoked these steps.
  2. Letmost significant units beminutes.
  3. Ifposition is past the end ofinput, return an error and abort these steps.
  4. If the character indicated byposition is not anASCII digit, then return an error and abort these steps.
  5. Collect a sequence of code points that areASCII digits, and letstring be the collected substring.
  6. Interpretstring as a base-ten integer. Letvalue1 be that integer.
  7. Ifstring is not exactly two characters in length, or ifvalue1 is greater than 59, letmost significant units behours.
  8. Ifposition is beyond the end ofinput or if the character atposition is not a U+003A COLON character (:), then return an error and abort these steps. Otherwise, moveposition forwards one character.
  9. Collect a sequence of code points that areASCII digits, and letstring be the collected substring.
  10. Ifstring is not exactly two characters in length, return an error and abort these steps.
  11. Interpretstring as a base-ten integer. Letvalue2 be that integer.
  12. Ifmost significant units ishours, or ifposition is not beyond the end ofinput and the character atposition is a U+003A COLON character (:), run these substeps:
    1. Ifposition is beyond the end ofinput or if the character atposition is not a U+003A COLON character (:), then return an error and abort these steps. Otherwise, moveposition forwards one character.
    2. Collect a sequence of code points that areASCII digits, and letstring be the collected substring.
    3. Ifstring is not exactly two characters in length, return an error and abort these steps.
    4. Interpretstring as a base-ten integer. Letvalue3 be that integer.
    Otherwise (ifmost significant units is nothours, and eitherposition is beyond the end ofinput, or the character atposition is not a U+003A COLON character (:)), letvalue3 have the value ofvalue2, thenvalue2 have the value ofvalue1, then letvalue1 equal zero.
  13. Ifposition is beyond the end ofinput or if the character atposition is not a U+002E FULL STOP character (.), then return an error and abort these steps. Otherwise, moveposition forwards one character.
  14. Collect a sequence of code points that areASCII digits, and letstring be the collected substring.
  15. Ifstring is not exactly three characters in length, return an error and abort these steps.
  16. Interpretstring as a base-ten integer. Letvalue4 be that integer.
  17. Ifvalue2 is greater than 59 or ifvalue3 is greater than 59, return an error and abort these steps.
  18. Letresult bevalue1×60×60 +value2×60 +value3 +value4/1000.
  19. Returnresult.

8.CSS Extensions

Note

This section specifies some CSS pseudo-elements and pseudo-classes and how they apply to WebVMT. This section does not apply touser agents that do not support CSS.

8.1Introduction

This section is non-normative.

The::cue pseudo-element represents a cue.

The::cue(selector) pseudo-element represents a cue or element inside a cue that match the given selector.

Note

Similarly to all other pseudo-elements, these pseudo-elements are not directly present in the<video> or<audio> element’s document tree.

AWebVMT node object is a conceptual construct used to represent components of cue metadata so that its processing can be described without reference to the underlying syntax.

Example 27
The following table shows examples of what can be selected with a given selector, together with WebVMT syntax to produce the relevant objects.
Selector
(CSS syntax example)
Matches
(WebVMT syntax example)
::cue
video::cue {  stroke: red;}
AnyWebVMT node objects.
WEBVMTNOTE Red00:00:00.000 --> 00:00:08.000{ "move-to":  { "lat": 51.504362, "lng": -0.076153 } }{ "line-to":  { "lat": 51.506646, "lng": -0.074651 } }NOTE Also red!00:00:08.000 --> 00:00:16.000{ "circle":  { "lat": 51.504789, "lng": -0.078642,    "rad": 20 } }
ID selector in::cue()
video::cue(#cue1) {  stroke: red;}
AnyWebVMT node objects constructed for a cue with atext track cue identifier matching the given ID, e.g. cue1.
WEBVMTNOTE Redcue100:00:00.000 --> 00:00:08.000{ "move-to":  { "lat": 51.504362, "lng": -0.076153 } }{ "line-to":  { "lat": 51.506646, "lng": -0.074651 } }NOTE Not red00:00:08.000 --> 00:00:16.000{ "circle":  { "lat": 51.504789, "lng": -0.078642,    "rad": 20 } }
Attribute selector in::cue()
video::cue([zone="safe1"]) {  stroke: red;}
For "path", anyWebVMT node object with the givenpath identifier; for "zone", theWebVMT node object with the givenzone identifier.
WEBVMTNOTE Not red00:00:00.000 --> 00:00:08.000{ "move-to":  { "lat": 51.504362, "lng": -0.076153 } }{ "line-to":  { "lat": 51.506646, "lng": -0.074651 } }NOTE Red00:00:08.000 --> 00:00:16.000{ "circle":  { "lat": 51.504789, "lng": -0.078642,    "rad": 20, "zone": "safe1" } }

8.2Processing Model

Pseudo-elements apply to elements that are matched by selectors. For the purpose of this section, that element is thematched element. The pseudo-elements defined in the following sections affect the styling of parts of WebVMT cues that are being rendered for thematched element.

A CSS user agent that implements thetext tracks model must implement the::cue and::cue(selector) pseudo-elements.

8.2.1The::cue pseudo-element

The::cue pseudo-element (with no argument) matches anyWebVMT node objects constructed for thematched element.

The following properties apply to the::cue pseudo-element with no argument; other properties set on the pseudo-element must be ignored:

  • stroke
  • stroke-opacity
  • stroke-width
  • fill
  • fill-opacity

The::cue(selector) pseudo-element with an argument must have an argument that consists of aCSS selector. It matches anyWebVMT node object constructed for thematched element that also matches the given CSS selector.

The following properties apply to the::cue() pseudo-element with an argument:

  • stroke
  • stroke-opacity
  • stroke-width
  • fill
  • fill-opacity

Properties that do not apply must be ignored.

8.2.2CSS Cascades

For the purpose of determining thecascade of the declarations inSTYLE blocks of a WebVMT file, the relative order of appearance of the style sheets must be the same order as they were added to the collection, and the order of appearance of the collection must be after any style sheets that apply to the associated<video> or<audio> element’s document.

Example 28: Cascade Precedence

For example, given the following (invalid) HTML document:

<!doctype html><title>Invalid cascade example</title><video controls autoplay src="video.webm">  <track default src="cascade.vmt"></video><style>  ::cue { fill: red; }</style>

And the "cascade.vmt" file contains:

WEBVMTSTYLE::cue { fill: lime; }NOTE Red or green?00:00:00.000 --> 00:00:25.000{ "circle": { "lat": 51.504789, "lng": -0.078642, "rad": 20 } }

Thefill:lime declaration would win, because it is last in thecascade, even though the<style> element is after the<video> element in the document order.

9.Known Issues

This section is non-normative.

This section captures issues which have been identified, but are not yet fully documented.

Editor's note

As the specification develops, issues will be moved out of this section and included elsewhere in the document, until it is no longer needed and is completely removed.

9.1Planned Features

This section lists potential features which have been identified during the development process, but have not yet matured to a full design specification.

Note

Features which appear in this section warrant further investigation, but are not guaranteed to appear in the final specification.

9.1.1Markers

An image linked to and displayed at an offset from a geolocation.

9.1.2Labels

A text string linked to and displayed at an offset from a geolocation.

9.1.3Tile Shortcuts

Shortcuts to popular tile URLs for easy access and to help avoid URL syntax errors.

9.1.4Layers

Syntax to allow more than one layer of map tiles to be specified, e.g. 'map' and 'satellite' layers.

This should be functional, but remain lightweight.

9.1.5Multiple APIs

The current tech demo is based on the Leaflet API, but should be broadened to support other web map APIs, e.g. Open Layers.

A hot-swap feature would allow users to switch API on-the-fly to take advantage of the unique features supported by different APIs, e.g. Street View.

9.1.6Camera Direction

Camera orientation may not match the direction of travel, or may be dynamic, e.g. for Augmented Reality. Field of view and zoom level also affect video frame content and may vary over time.

9.1.7Co-ordinate Reference Systems

Although originally conceived for Earth-based use, spatial data in other environments could be accommodated by specifying the co-ordinate reference system. For example, location on another planet, e.g. Mars, or in an artifical environment, e.g. a video game.

9.1.8Moving Objects

WebVMT paths represent objects moving through the mapped space, though could be extended to support properties associated with motion such as distance travelled, speed, heading, etc. through a defined API.

WebVMT zones represent regions in the mapped space, which could be extended to supportWebVMT path properties for their centroid's motion and include dynamic properties such as area and volume.

Care should be taken to build a lightweight interface which includes simple, common properties that are useful to most use cases and avoids overloading with unnecessary edge cases, processing overheads and complexity.

9.1.9Height Reference

In addition to height above the WGS84 ellipsoid, an option could be added to measure altitude from mean sea level, e.g. for an aircraft, using a suitable Earth Gravitational Model (EGM) or from ground level, e.g. for the height of a structure.

9.2Planned Interfaces

This section lists interfaces which have been identified during the development process, but have not yet matured to a full design specification.

9.2.1DataCue Interface

ExposeWebVMT cues in the DOM API, based on theDataCue API proposed in WICG.

This is analogous to theVTTCue interface.

cue = new DataCue(startTime, endTime, value, type);
Returns a newDataCue object.
cue.value
Returns the metadata value of aWebVMT cue. Can be set.
cue.type
Returns the metadata type of aWebVMT cue.

9.2.2VMTMap Interface

Expose aWebVMT map in the DOM API.

WebIDL[Exposed=Window]interfaceVMTMap {constructor(doublecenterLatitude,doublecenterLongitude,              optionaldoublecenterAltitude,doublezoomRadius);  attributedoublecenterLatitude;  attributedoublecenterLongitude;  attributedoublecenterAltitude;  attributedoublezoomRadius;objectgetMap();};

This is analogous to theVTTRegion interface.

map = new VMTMap(centerLatitude, centerLongitude, centerAltitude, zoomRadius);
Returns a newVMTMap object.
map.centerLatitude
Returns thelocation latitude of themap center location. Can be set. Throws aRangeError if the new value is not in the range -90..90.
map.centerLongitude
Returns thelocation longitude of themap center location. Can be set. Throws aRangeError if the new value is not in the range -180..180.
map.centerAltitude
Returns thelocation altitude of themap center location. Can be set.
map.zoomRadius
Returns themap zoom radius. Can be set. Throws aRangeError if the new value is not positive.
map.getMap()
Returns themap interface object.

10.IANA Considerations

10.1text/vmt

This registration is for community review and will be submitted to the IESG for review, approval, and registration with IANA.

Type name:
text
Subtype name:
vmt
Required parameters:
No parameters
Optional parameters:
No parameters
Encoding considerations:
8bit (always UTF-8)
Security considerations:
Text track files themselves pose no immediate risk unless sensitive information is included within the data. Implementations, however, are required to follow specific rules when processing text tracks, to ensure that certain origin-based restrictions are honored. Failure to correctly implement these rules can result in information leakage, cross-site scripting attacks, and the like.
Interoperability considerations:
Rules for processing both conforming and non-conforming content are defined in this specification.
Published specification:
This document is the relevant specification.
Applications that use this media type:
Web browsers, other media players and location-aware video devices such as drones, dashcams and smartphones.
Additional information:
Magic number(s):
WebVMT files all begin with one of the following byte sequences (where "EOF" means the end of the file):
  • EF BB BF 57 45 42 56 4D 54 0A
  • EF BB BF 57 45 42 56 4D 54 0D
  • EF BB BF 57 45 42 56 4D 54 20
  • EF BB BF 57 45 42 56 4D 54 09
  • EF BB BF 57 45 42 56 4D 54 EOF
  • 57 45 42 56 4D 54 0A
  • 57 45 42 56 4D 54 0D
  • 57 45 42 56 4D 54 20
  • 57 45 42 56 4D 54 09
  • 57 45 42 56 4D 54 EOF
Note: Magic Numbers

(An optional UTF-8 BOM, the ASCII string "WEBVMT", and finally a space, tab, line break, or the end of the file.)

File extension(s):
"vmt"
Macintosh file type code(s):
No specific Macintosh file type codes are recommended for this type.
Person & email address to contact for further information:
Rob Smith <rob.smith@awayteam.co.uk>
Intended usage:
Common
Restrictions on usage:
No restrictions apply.
Authors:
Rob Smith <rob.smith@awayteam.co.uk>
Change controller:
W3C

Fragment identifiers have no meaning withtext/vmt resources.

A.Privacy and Security Considerations

A.1Text-Based Format Security

As with any text-based format, it is possible to construct malicious content that might cause buffer over-runs, value overflows (e.g. string representations of integers that overflow a given word length), and the like. Implementers should take care in implementing a parser that over-long lines, field values, or encoded values do not cause security problems.

A.2Styling-Related Privacy and Security

WebVMT can embed CSS style sheets, which will be applied in user agents that support CSS. Under these circumstances, the privacy and security considerations of CSS apply, with the following caveats.

Such style sheets cannot fetch any external resources, and it is important for privacy that user agents do not allow this. Otherwise, WebVMT files could be authored such that a third party is notified when the user watches a particular video, and even the current time in that video.

It is possible for a user agent to offer user style sheets, but their presence and nature will not be detectable by scripts running in the same user agent (e.g. browser) since the CSS object model for such style sheets is not exposed to script and there is no way to get the computed style for pseudo-elements other than::before and::after with thegetComputedStyle() API.

A.3Scripting-Related Security

WebVMT does not include or enable scripting. It is important that user agents do not support a way to execute script embedded in a WebVMT file.

However, it is possible to construct and deliver a file that is designed not to present timed metadata, but instead to provide timed input (‘triggers’) to a script system. A poorly-written script or script system might then cause security, privacy or other problems; however, this consideration really applies to the script system. Since WebVMT supplies these triggers at their timestamps, a malicious file might present such triggers very rapidly, perhaps causing undue resource consumption.

A.4Location-Related Security

WebVMT provides a common format in which to share location data synchronized with video for the web. Proper consideration should be given to any sensitive details that may be revealed as a result of sharing such personal information. For example, posting a geotagged image online can reveal the location of the content creator at a particular time which can infer their absence from distant locations since travelling takes time. The identities of people who appear in video frames may be determined visually to also reveal their presence at a nearby location or their absence from a distant one.

In order to share content responsibly, users should consider:

  1. What is included?
  2. How can it be used?
  3. What can it tell others?

Further guidance for users, developers and regulators can be found in theW3C Note:The Responsible Use of Spatial Data.

B.References

B.1Normative references

[CSS-CASCADE-4]
CSS Cascading and Inheritance Level 4. Elika Etemad; Tab Atkins Jr.. W3C. 13 January 2022. W3C Candidate Recommendation. URL:https://www.w3.org/TR/css-cascade-4/
[CSS-SYNTAX-3]
CSS Syntax Module Level 3. Tab Atkins Jr.; Simon Sapin. W3C. 24 December 2021. W3C Candidate Recommendation. URL:https://www.w3.org/TR/css-syntax-3/
[CSS22]
Cascading Style Sheets Level 2 Revision 2 (CSS 2.2) Specification. Bert Bos. W3C. 12 April 2016. W3C Working Draft. URL:https://www.w3.org/TR/CSS22/
[CSSOM-1]
CSS Object Model (CSSOM). Daniel Glazman; Emilio Cobos Álvarez. W3C. 26 August 2021. W3C Working Draft. URL:https://www.w3.org/TR/cssom-1/
[ECMA-404]
The JSON Data Interchange Format, 2nd edition. Ecma International. 1 December 2017. Standard. URL:https://www.ecma-international.org/wp-content/uploads/ECMA-404_2nd_edition_december_2017.pdf
[HTML51]
HTML 5.1 2nd Edition. Steve Faulkner; Arron Eicholz; Travis Leithead; Alex Danilo. W3C. 28 January 2021. W3C Recommendation. URL:https://www.w3.org/TR/html51/
[RESPONSIBLE-USE-SPATIAL]
The Responsible Use of Spatial Data. JOSEPH ABHAYARATNA; Ed Parsons. W3C. 27 May 2021. W3C Working Group Note. URL:https://www.w3.org/TR/responsible-use-spatial/
[RFC3629]
UTF-8, a transformation format of ISO 10646. F. Yergeau. IETF. November 2003. Internet Standard. URL:https://www.rfc-editor.org/rfc/rfc3629
[SELECTORS4]
Selectors Level 4. Elika Etemad; Tab Atkins Jr.. W3C. 11 November 2022. W3C Working Draft. URL:https://www.w3.org/TR/selectors-4/
[WEBIDL-1]
Web IDL Standard. Edgar Chen; Timothy Gu. WHATWG. Living Standard. URL:https://webidl.spec.whatwg.org/
[WEBVTT]
WebVTT: The Web Video Text Tracks Format. Silvia Pfeiffer. W3C. 4 April 2019. W3C Candidate Recommendation. URL:https://www.w3.org/TR/webvtt1/

B.2Informative references

[RFC3986]
Uniform Resource Identifier (URI): Generic Syntax. T. Berners-Lee; R. Fielding; L. Masinter. IETF. January 2005. Internet Standard. URL:https://www.rfc-editor.org/rfc/rfc3986
[SVG11]
Scalable Vector Graphics (SVG) 1.1 (Second Edition). Erik Dahlström; Patrick Dengler; Anthony Grasso; Chris Lilley; Cameron McCormack; Doug Schepers; Jonathan Watt; Jon Ferraiolo; Jun Fujisawa; Dean Jackson et al. W3C. 16 August 2011. W3C Recommendation. URL:https://www.w3.org/TR/SVG11/
[WEBIDL]
Web IDL Standard. Edgar Chen; Timothy Gu. WHATWG. Living Standard. URL:https://webidl.spec.whatwg.org/


[8]ページ先頭

©2009-2025 Movatter.jp