Movatterモバイル変換


[0]ホーム

URL:


W3C

EMMA: Extensible MultiModal Annotationmarkup language Version 1.1

W3C Working Draft 27June 2013

This version:
http://www.w3.org/TR/2013/WD-emma11-20130627/
Latest version:
http://www.w3.org/TR/emma11/
Previous version:
http://www.w3.org/TR/2012/WD-emma11-20120209/
Editor:
Michael Johnston (AT&T)
Authors:
Paolo Baggia (while at Loquendo, currently NuanceCommunications)
Michael Bodell (until May 2012, while at Microsoft)
Daniel C. Burnett (Voxeo)
Deborah A. Dahl (W3C Invited Expert)

Copyright© 2013W3C® (MIT,ERCIM,Keio,Beihang), All Rights Reserved.W3Cliability,trademark anddocumentuse rules apply.


Abstract

The W3C Multimodal Interaction Working Group aims to developspecifications to enable access to the Web using multimodalinteraction. This document is part of a set of specifications formultimodal systems, and provides details of an XML markup languagefor containing and annotating the interpretation of user input.Examples of interpretation of user input are a transcription intowords of a raw signal, for instance derived from speech, pen orkeystroke input, a set of attribute/value pairs describing theirmeaning, or a set of attribute/value pairs describing a gesture.The interpretation of the user's input is expected to be generatedby signal interpretation processes, such as speech and inkrecognition, semantic interpreters, and other types of processorsfor use by components that act on the user's inputs such asinteraction managers.

Status of this Document

This section describes the status of this document at thetime of its publication. Other documents may supersede thisdocument. A list of current W3C publications and the latestrevision of this technical report can be found in theW3C technical reports index athttp://www.w3.org/TR/.

This is the 27 June 2013 Second Public Working Draft of "EMMA:Extensible MultiModal Annotation markup language Version 1.1". Ithas been produced by theMultimodal Interaction WorkingGroup, which is part of theMultimodalInteraction Activity.

This specification describes markup for representinginterpretations of user input (speech, keystrokes, pen input etc.)together with annotations for confidence scores, timestamps, inputmedium etc., and forms part of the proposals for theW3C MultimodalInteraction Framework.

TheEMMA: ExtensibleMultimodal Annotation 1.0 specification was published as a W3CRecommendation in February 2009. Since then there have beennumerous implementations of the standard and extensive feedback hascome in regarding desired new features and clarifications requestedfor existing features. The W3C Multimodal Interaction Working Groupexamined a range of different use cases for extensions of the EMMAspecification and published a W3C Note on Use Cases for PossibleFuture EMMA Features [EMMA Use Cases]. In this working draft of EMMA 1.1, we havedeveloped a set of new features based on feedback from implementersand have also added clarification text in a number of placesthroughout the specification. The new features include: support foradding human annotations (emma:annotation,emma:annotated-tokens), support for inlinespecification of process parameters (emma:parameters,emma:parameter,emma:parameter-ref),support for specification of models used in processing beyondgrammars (emma:process-model,emma:process-model-ref), extensions toemma:grammar to enable inline specification ofgrammars, a new mechanism for indicating which grammars are active(emma:grammar-active,emma:active),support for non-XML semantic payloads(emma:result-format), support for multipleemma:info elements and reference to theemma:info relevant to an interpretation(emma:info-ref), and a new attribute to complement theemma:medium andemma:mode attributes thatenables specification of the modality used to express an input(emma:expressed-through).

The changes from the last working draft are:

Also changes from EMMA 1.0 can be found inAppendix F.

Comments are welcome onwww-multimodal@w3.org (archive).SeeW3C mailing list and archiveusage guidelines.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the5February 2004 W3C Patent Policy. W3C maintains apublic listof any patent disclosures made in connection with thedeliverables of the group; that page also includes instructions fordisclosing a patent. An individual who has actual knowledge of apatent which the individual believes containsEssential Claim(s) must disclose the information in accordancewithsection 6 of the W3C Patent Policy.

The sections in the main body of this document are normativeunless otherwise specified. The appendices in this document areinformative unless otherwise indicated explicitly.

Conventions of this Document

All sections in this specification are normative, unlessotherwise indicated. The informative parts of this specificationare identified by "Informative" labels within sections.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALLNOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"in this document are to be interpreted as described in [RFC2119].

Table of Contents

1. Introduction

This section isInformative.

This document presents an XML specification for EMMA, anExtensible MultiModal Annotation markup language, responding to therequirements documented inRequirements for EMMA [EMMARequirements]. This markup language is intendedfor use by systems that provide semantic interpretations for avariety of inputs, including but not necessarily limited to,speech, natural language text, GUI and ink input.

It is expected that this markup will be used primarily as astandard data interchange format between the components of amultimodal system; in particular, it will normally be automaticallygenerated by interpretation components to represent the semanticsof users' inputs, not directly authored by developers.

The language is focused on annotating single inputs from users,which may be either from a single mode or a composite inputcombining information from multiple modes, as opposed toinformation that might have been collected over multiple turns of adialog. The language provides a set of elements and attributes thatare focused on enabling annotations on user inputs andinterpretations of those inputs.

An EMMA document can be considered to hold three types ofdata:

Given the assumptions above about the nature of data representedin an EMMA document, the following general principles apply to thedesign of EMMA:

The annotations of EMMA should be considered 'normative' in thesense that if an EMMA component produces annotations as describedinSection3andSection4, these annotations must be represented using the EMMAsyntax. The Multimodal Interaction Working Group may address inlater drafts the issues of modularization and profiling; that is,which sets of annotations are to be supported by which classes ofEMMA component.

1.1 Uses of EMMA

The general purpose of EMMA is to represent informationautomatically extracted from a user's input by an interpretationcomponent, where input is to be taken in the general sense of ameaningful user input in any modality supported by the platform.The reader should refer to the sample architecture inW3CMultimodal Interaction Framework[MMIFramework], which shows EMMA conveying content betweenuser input modality components and an interaction manager.

Components that generate EMMA markup:

  1. Speech recognizers
  2. Handwriting recognizers
  3. Natural language understanding engines
  4. Other input media interpreters (e.g. DTMF, pointing,keyboard)
  5. Multimodal integration component

Components that use EMMA include:

  1. Interaction manager
  2. Multimodal integration component

Although not a primary goal of EMMA, a platform may also chooseto use this general format as the basis of a general semanticresult that is carried along and filled out during each stage ofprocessing. In addition, future systems may also potentially makeuse of this markup to convey abstract semantic content to berendered into natural language by a natural language generationcomponent.

1.2 Terminology

anchor point
When referencing an input interval withemma:time-ref-uri,emma:time-ref-anchor-point allows you to specifywhether the referenced anchor is the start or end of theinterval.
annotation
Information about the interpreted input, for example,timestamps, confidence scores, links to raw input, etc.
composite input
An input formed from several pieces, often in different modes,for example, a combination of speech and pen gesture, such assaying "zoom in here" and circling a region on a map.
confidence
A numerical score describing the degree of certainty in aparticular interpretation of user input.
data model
For EMMA, a data model defines a set of constraints on possibleinterpretations of user input.
derivation
Interpretations of user input are said to be derived from thatinput, and higher level interpretations may be derived from lowerlevel ones. EMMA allows you to reference the user input orinterpretation a given interpretation was derived from, seesemantic interpretation.
dialog
For EMMA, dialog can be considered as a sequence ofinteractions between a user and the application.
endpoint
In EMMA, this refers to a network location which is the sourceor recipient of an EMMA document. It should be noted that the usageof the term "endpoint" in this context is different from the waythat the term is used in speech processing, where it refers to theend of a speech input.
gestures
In multimodal applications gestures are communicative acts madeby the user or application. An example is circling an area on a mapto indicate a region of interest. Users may be able to gesture witha pen, keystrokes, hand movements, head movements, or sound.Gestures often form part ofcomposite input. Application gestures are typicallyanimations and/or sound effects.
grammar
A set of rules that describe a sequence of tokens expected in agiven input. These can be used by speech and handwritingrecognizers to increase recognition accuracy.
handwriting recognition
The process of converting pen strokes into text.
ink recognition
This includes the recognition of handwriting and pengestures.
input cost
In EMMA, this refers to a numerical measure indicating theweight or processing cost associated with a user's input or part oftheir input.
input device
The device proving a particular input, for example, amicrophone, a pen, a mouse, a camera, or a keyboard.
input function
In EMMA, this refers tothe use a particular inputis serving, for example, as part of a recording or transcription,as part of a dialog, or as a means to verify the user'sidentity.
input medium
Whether the input is acoustic, visual, or tactile, forinstance, a spoken utterance is an example of an aural input, ahand gesture as seen by a camera is an example of a visual input,pointing with a mouse or pen is an example of a tactile input.
input mode
This distinguishes a particular means of providing an inputwithin a general input medium, for example, speech, DTMF, ink, keystrokes, video, photograph, etc.
input source
This is the device that provided the input, for example aparticular microphone or camera. EMMA allows you to identify thesewith a URI.
input tokens
In EMMA, this refers to a sequence of characters, words orother discrete units of input.
instance data
A representation in XML of an interpretation of userinput.
interaction manager
A processor that determines how an application interacts with auser. This can be at multiple levels of abstraction, for example,at a detailed level, determining what prompts to present to theuser and what actions to take in response to user input, versus ahigher level treatment in terms of goals and tasks for achievingthose goals. Interaction managers are frequently event driven.
interpretation
In EMMA, an interpretation of user input refers to informationderived from the user input that is meaningful to theapplication.
keystroke input
Input provided by the user pressing on a sequence of keys(buttons), such as a computer keyboard or keypad.
lattice
A set of nodes interconnected with directed arcs such that byfollowing an arc, you can never find yourself back at a node youhave already visited (i.e. a directed acyclic graph). Latticesprovide a flexible means to represent the results of speech andhandwriting recognition, in terms of arcs representing words orcharacter sequences. Different arcs from the same node representdifferent local hypotheses as to what the user said or wrote.
metadata
Information describing another set of data, for instance, alibrary catalog card with information on the author, title andlocation of a book. EMMA is designed to support input processors inproviding metadata for interpretations of user input.
multimodal integration
The process of combining inputs from different modes to createan interpretation of composite input. This is also sometimesreferred to asmultimodal fusion.
multimodal interaction
The means for a user to interact with an application using morethan one mode of interaction, for instance, offering the user thechoice of speaking or typing, or in some cases, allowing the userto provide a composite input involving multiple modes.
natural languageunderstanding
The process of interpreting text in terms that are useful foran application.
N-best list
An N-best list is a list of the most likely hypotheses for whatthe user actually said or wrote, where N stands for an integralnumber such as 5 for the 5 most likely hypotheses.
raw signal
An uninterpreted input, such as an audio waveform captured froma microphone.
semantic interpretation
A normalized representation of the meaning of a user input, forinstance, mapping the speech for "San Francisco" into the airportcode "SFO".
semantic processor
In EMMA, this refers to systems that can derive interpretationsof user input, for instance, mapping the speech for "San Francisco"into the airport code "SFO".
signal interpretation
The process of mapping a discrete or continuous signal into asymbolic representation that can be used by an application, forinstance, transforming the audio waveform corresponding to someonesaying "2005" into the number 2005.
speech recognition
The process of determining the textual transcription of a pieceof speech.
speech synthesis
The process of rendering a piece of text into the correspondingspeech, i.e. synthesizing speech from text.
text to speech
The process of rendering a piece of text into the correspondingspeech.
time stamp
The time that a particular input or part of an input began orended.
URI: Uniform Resource Identifier
A URI is a unifying syntax for the expression of names andaddresses of objects on the network as used in the World Wide Web.Within this specification, the term URI refers to a UniversalResource Identifier as defined in [RFC3986]and extended in [RFC3987]with the new name IRI. The term URI has been retained in preferenceto IRI to avoid introducing new names for concepts such as "BaseURI" that are defined or referenced across the whole family of XMLspecifications. A URI is defined as any legalanyURI primitive as defined in XML Schema Part 2:Datatypes Second Edition Section 3.2.17 [SCHEMA2].
user input
An input provided by a user as opposed to something generatedautomatically.

2. Structure of EMMA documents

This section isInformative.

As noted above, the main components of an interpreted user inputin EMMA are the instance data, an optional data model, and themetadata annotations that may be applied to that input. Therealization of these components in EMMA is as follows:

An EMMAinterpretation is the primary unit for holdinguser input as interpreted by an EMMA processor. As will be seenbelow, multiple interpretations of a single input are possible.

EMMA provides a simple structural syntax for the organization ofinterpretations and instances, and an annotative syntax to applythe annotation to the input data at different levels.

An outline of the structural syntax and annotations found inEMMA documents is as follows. A fuller definition may be found inthe description of individual elements and attributes inSection3 andSection4.

From the defined root nodeemma:emma the structureof an EMMA document consists of a tree of EMMA container elements(emma:one-of,emma:sequence,emma:group) terminating in a number of interpretationelements (emma:interpretation). Theemma:interpretation elements serve as wrappers foreither application namespace markup describing the interpretationof the users input or anemma:lattice element oremma:literal element . A singleemma:interpretation may also appear directly under theroot node.

The EMMA elementsemma:emma,emma:interpretation,emma:one-of, andemma:literal and the EMMA attributesemma:no-input,emma:uninterpreted,emma:medium, andemma:mode are requiredof all implementations. The remaining elements and attributes areoptional and may be used in some implementations and not otherdepending on the specific modalities and processing beingrepresented.

To illustrate this, here is an example of an EMMA documentrepresenting input to a flight reservation application. In thisexample there are two speech recognition results and associatedsemantic representations of the input. The system is uncertainwhether the user meant "flights from Boston to Denver" or "flightsfrom Austin to Denver". The annotations to be captured aretimestamps and confidence scores for the two inputs.

Example:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:one-of emma:start="1087995961542" emma:end="1087995963542"    emma:medium="acoustic" emma:mode="voice">    <emma:interpretation emma:confidence="0.75"    emma:tokens="flights from boston to denver">      <origin>Boston</origin>      <destination>Denver</destination>    </emma:interpretation>    <emma:interpretation emma:confidence="0.68"    emma:tokens="flights from austin to denver">      <origin>Austin</origin>      <destination>Denver</destination>    </emma:interpretation>  </emma:one-of></emma:emma>

Attributes on the rootemma:emma element indicatethe version and namespace. Theemma:emma elementcontains anemma:one-of element which contains adisjunctive list of possible interpretations of the input. Theactual semantic representation of each interpretation is within theapplication namespace. In the example here the application specificsemantics involves elementsorigin anddestination indicating the origin and destinationcities for looking up a flight. The timestamp is the same for bothinterpretations and it is annotated using values in milliseconds intheemma:start andemma:end attributes ontheemma:one-of. The confidence scores and tokensassociated with each of the inputs are annotated using the EMMAannotation attributesemma:confidence andemma:tokens on each of theemma:interpretation elements.

Attributes in EMMA cascade from a containingemma:one-of element to the individual interpretations.In the example above, theemma:start,emma:end,emma:medium, andemma:mode attributes are all specified once onemma:one-of but apply to both of the containedemma:interpretation elements. This is an importantmechanism as it limits the need to repeat annotations. More detailson the scope of annotations among EMMA structural elements, andalso on the scope of annotations within derivations, where multipledifferent processing stages apply to an input, can be found inSection4.3.

Many EMMA elements allow for content to bespecified either inline or by reference using therefattribute. This is an important mechanism as it allows for EMMAdocuments to be less verbose and yet allows the EMMA consumer toaccess content from an external document, possibly on a remoteserver. For example, in the case ofemma:grammar agrammar can either be specified inline within the element or theref attribute onemma:grammar canindicate the location where the grammar document can be retrieved.Similarly withemma:model a data model can bespecified inline or by reference through therefattribute. Aref attribute can also be used on theEMMA container elementsemma:sequence,emma:one-of,emma:group, andemma:lattice. In these cases, therefattribute provides a pointer to a portion of an external EMMAdocument, possibly on a remote server. This can be achieved usingURI ID references to pick out a particular element within theexternal EMMA document. One use case forref with thecontainer elements is to allow for inline content to be partial andfor theref to provide access to the full content. Forexample, in the case ofemma:one-of, an EMMA documentdelivered to an EMMA consumer could contain an abbreviated list ofinterpretations, e.g. the top 3, while anemma:one-ofelement accessible through the URI in ref to include a moreinclusive list of 20emma:interpretation elements. Theemma:partial-content attribute MUST be used on thepartially specified element if theref refers to amore fully specified element. Theemma:ref attributecan also be used onemma:info,emma:parameters, andemma:annotation.The use ofref on specific elements is described and exemplifiedin the specific section describing each element.

2.1 Data model

An EMMA data model expresses the constraints on the structureand content of instance data, for the purposes of validation. Assuch, the data model may be considered as a particular kind ofannotation (although, unlike other EMMA annotations, it is not afeature pertainingto a specific user input at aspecific moment in time, it is rather a static and, by its verydefinition, application-specific structure).Thespecification ofa data model in EMMA is optional.

Since Web applications today use different formats to specifydata models, e.g.XML Schema Part 1: Structures SecondEdition [XML SchemaStructures], XForms1.0 (SecondEdition) [XFORMS],RELAX NG Specification [RELAX-NG],etc., EMMA itself is agnostic to the format of data model used.

Data model definition and reference is defined inSection4.1.1.

2.2 EMMA namespace prefixes

An EMMA attribute is qualified with the EMMA namespace prefix ifthe attribute can also be used as an in-line annotation on elementsin the application's namespace. Most of the EMMA annotationattributes inSection4.2 are in this category. An EMMA attribute is not qualifiedwith the EMMA namespace prefix if the attribute only appears on anEMMA element. This rule ensures consistent usage of the attributesacross all examples.

Attributes from other namespaces are permissible on all EMMAelements. As an examplexml:lang may be used toannotate the human language of character data content.

3. EMMA structural elements

This section defines elements in the EMMA namespace whichprovide the structural syntax of EMMA documents.

3.1 Root element:emma:emma

Annotationemma:emma
DefinitionThe root element of an EMMA document.
ChildrenTheemma:emma element MUST immediately contain asingleemma:interpretation element or EMMA containerelement:emma:one-of,emma:group,emma:sequence. It MAY also contain an optional singleemma:derivation element. It MAY also contain multipleoptionalemma:grammar elements,emma:model elements, andemma:endpoint-info elements,emma:info elements,emma:process-model elements,emma:parameters elements, andemma:annotation elements.Itmay also contain a singleemma:locationelement.
Attributes
  • Required:
    • version: the version of EMMA used for theinterpretation(s). Interpretations expressed using thisspecification MUST use1.1 for the value.
    • Namespace declaration for EMMA, see below.
  • Optional:
    • any other namespace declarations for application specificnamespaces.
    • doc-ref: an attribute of typexsd:anyURI providing a URI indicating the location ona server where the EMMA document with emma:emma as root can beretrieved from.
    • prev-doc:an attribute of typexsd:anyURI providing a URI indicating the location ona server where the EMMA document previous to this EMMA document inthe sequence of interaction can be retrieved from.
Applies toNone

The root element of an EMMA document is namedemma:emma. It holds a singleemma:interpretation or EMMA container element(emma:one-of,emma:sequence,emma:group). It MAY also contain a singleemma:derivation element containing earlier stages ofthe processing of the input (SeeSection4.1.2). It MAY also contain multiple optionalemma:grammar,emma:model, andemma:endpoint-info ,emma:info,emma:process-model,emma:parameters, andemma:annotation elements.

It MAY hold attributes for information pertaining to EMMAitself, along with any namespaces which are declared for the entiredocument, and any other EMMA annotative data. Theemma:emma element and other elements and attributesdefined in this specification belong to the XML namespaceidentified by the URI "http://www.w3.org/2003/04/emma". In theexamples, the EMMA namespace is generally declared using theattributexmlns:emma on the rootemma:emma element. EMMA processors MUST support thefull range of ways of declaring XML namespaces as defined by theNamespaces in XML 1.1 (Second Edition) [XMLNS].Application markupMAYMUST bedeclaredeither in an explicit application namespace,or an undefined namespace by setting xmlns="".

For example:

<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma">    ....</emma:emma>

or

<emma version="1.1" xmlns="http://www.w3.org/2003/04/emma">    ....</emma>

The optional attributesdoc-ref andprev-doc MAY be used onemma:emma inorder to indicate the location where the EMMA document comprisingthatemma:emma element can be retrieved from, and thelocation of the previous EMMA document in a sequence ofinteractions. One important use case fordoc-ref isfor client side logging. A client receiving an EMMA document canrecord the URI found indoc-ref in a log file insteadof a local copy of the whole EMMA document. Theprev-doc attribute provides a mechanism for tracking asequence of EMMA documents representing the results of processingdistinct turns of interaction by an EMMA processor.

In the following example,doc-ref onEMMA provides a URI which indicates where the EMMA documentembodied in thisemma:emma can be retrieved from.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example"    doc-ref="http://example.com/trainapp/user123/emma0727080512.xml">  <emma:interpretation      emma:medium="acoustic"       emma:mode="voice"      emma:function="dialog"      emma:verbal="true"      emma:signal="http://example.com/audio/input678.amr"      emma:process="http://example.com/asr/params.xml"      emma:tokens="trains to london tomorrow">    <destination>London</destination>    <date>tomorrow</date>  </emma:interpretation></emma:emma>

In the following example, againdoc-refindicates where the EMMA document can be retrieved from but inadditionprev-doc indicates where the previous EMMAdocument can be retrieved from.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma    http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example"    doc-ref="http://example.com/trainapp/user123/emma0730080512.xml"    prev-doc="http://example.com/trainapp/user123/emma0727080512.xml">  <emma:interpretation      emma:medium="acoustic"       emma:mode="voice"      emma:function="dialog"      emma:verbal="true"      emma:signal="http://example.com/audio/input679.amr"      emma:process="http://example.com/asr/params.xml"      emma:tokens="from cambridge">    <origin>Cambridge</origin>  </emma:interpretation></emma:emma>
EMMA processors may be use a number of differenttechniques to determine theprev-doc. It may, forexample, be determined based on the session. In a session ofinteraction a server processing requests for processing can trackthe previous EMMA result for a client and indicate that inprev-doc. Alternatively, the URI of the last EMMAresult could be passed in as a parameter in a request to an EMMAprocessor and returned in theprev-doc with the nextresult.

3.2 Interpretation element:emma:interpretation

Annotationemma:interpretation
DefinitionTheemma:interpretation element acts as a wrapperfor application instance data or lattices.
ChildrenTheemma:interpretation element MUST immediatelycontain either application instance data, or a singleemma:lattice element, or a singleemma:literal element, or in the case of uninterpretedinput or no inputemma:interpretationMUST be empty. It MAY also containmultipleoptionalemma:derived-fromelements andan optional singleemma:infoelement.It MAY alsocontain multiple optionalemma:annotation elements. ItMAY also contain multipleemma:parameters elements. ItMAY also contain a single optionalemma:grammar-activeelement.It may also contain a singleemma:location element.
Attributes
  • Required: Attributeid of typexsd:ID that uniquely identifies the interpretationwithin the EMMA document.
  • Optional: The annotation attributes:emma:tokens,emma:process,emma:no-input,emma:uninterpreted,emma:lang,emma:signal,emma:signal-size,emma:media-type,emma:confidence,emma:source,emma:start,emma:end,emma:time-ref-uri,emma:time-ref-anchor-point,emma:offset-to-start,emma:duration,emma:medium,emma:mode,emma:function,emma:verbal,emma:cost,emma:grammar-ref,emma:endpoint-info-ref,emma:model-ref,emma:dialog-turn,emma:info-ref,emma:parameter-ref,emma:process-model-ref,emma:annotated-tokens,emma:result-format,emma:expressed-through,emma:device-type.
Applies toTheemma:interpretation element is legal only as achild ofemma:emma,emma:group,emma:one-of,emma:sequence, oremma:derivation.

Theemma:interpretation element holds a singleinterpretation represented in application specific markup, or asingleemma:lattice element, or a singleemma:literal element.

Theemma:interpretation element MUST be empty if itis marked withemma:no-input="true"(Section4.2.3). Theemma:interpretation elementMUST be empty if it has been annotated withemma:uninterpreted="true"(Section4.2.4) oremma:function="recording"(Section4.2.11).

Attributes:

  1. id a REQUIREDxsd:ID value that uniquelyidentifies the interpretation within the EMMA document.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation emma:medium="acoustic" emma:mode="voice">    ...  </emma:interpretation></emma:emma>

Whileemma:medium andemma:mode areoptional onemma:interpretation, note that all EMMAinterpretations must be annotated foremma:medium andemma:mode, so either these attributes must appeardirectly onemma:interpretation or they must appear onan ancestoremma:one-of node or they must appear on anearlier stage of the derivation listed inemma:derivation.

3.3 Container elements

3.3.1emma:one-of element

Annotationemma:one-of
DefinitionA container element indicating a disjunction among a collectionof mutually exclusive interpretations of the input.
ChildrenTheemma:one-of element MUST immediately contain acollection of one or moreemma:interpretation elementsor container elements:emma:one-of,emma:group,emma:sequenceUNLESS it is annotated withref. ItMAY also containmultiple optionalemma:derived-from elements and multipleemma:infoelements.ItMAY also contain multiple optionalemma:annotationelements. It MAY also contain multiple optionalemma:parameters elements. It MAY also contain a singleoptionalemma:grammar-active element.It MAY also contain a singleemma:latticeelement containing the lattice result for the same input.It may also contain a singleemma:location element.
Attributes
  • Required:
    • Attributeid of typexsd:ID
    • The attributedisjunction-type MUST be present ifemma:one-of is embedded withinemma:one-of.The possible values ofdisjunction-type are {recognition,understanding,multi-device, andmulti-process}.
  • Optional:
    • On a single non-embeddedemma:one-of the attributedisjunction-type is optional.
    • A singleref attribute of typexsd:anyURI providing a reference to a location wherethe content of the element can be retrieved from
    • Anemma:partial-content attribute oftypexsd:boolean indicating whether the content insidethe element is partial and more can be retrieved from an externaldocument throughref
    • The following annotation attributes are optional:emma:tokens,emma:process,emma:lang,emma:signal,emma:signal-size,emma:media-type,emma:confidence,emma:source,emma:start,emma:end,emma:time-ref-uri,emma:time-ref-anchor-point,emma:offset-to-start,emma:duration,emma:medium,emma:mode,emma:function,emma:verbal,emma:cost,emma:grammar-ref,emma:endpoint-info-ref,emma:model-ref,emma:dialog-turn,emma:info-ref,emma:parameter-ref,emma:process-model-ref,emma:annotated-tokens,emma:result-format,emma:expressed-through,emma:device-type.
Applies toTheemma:one-of element MAY only appear as a childofemma:emma,emma:one-of,emma:group,emma:sequence, oremma:derivation.

Theemma:one-of element acts as a container for acollection of one or more interpretation(emma:interpretation) or container elements(emma:one-of,emma:group,emma:sequence), and denotes that these are mutuallyexclusive interpretations.

An N-best list of choices in EMMA MUST be represented as a setofemma:interpretation elements contained within anemma:one-of element. For instance, a series ofdifferent recognition results in speech recognition might berepresented in this way.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:one-ofemma:medium="acoustic" emma:mode="voice"ref="http://www.example.com/i156/emma.xml#r1>    <emma:interpretation>      <origin>Boston</origin>      <destination>Denver</destination>      <date>03112003</date>    </emma:interpretation>    <emma:interpretation>      <origin>Austin</origin>      <destination>Denver</destination>      <date>03112003</date>    </emma:interpretation>  </emma:one-of></emma:emma>

The function of theemma:one-of element is torepresent a disjunctive list of possible interpretations of a userinput. A disjunction of possible interpretations of an input can bethe result of different kinds of processing or ambiguity. Onesource is multiple results from a recognition technology such asspeech or handwriting recognition. Multiple results can also occurfrom parsing or understanding natural language. Another possiblesource of ambiguity is from the application of multiple differentkinds of recognition or understanding components to the same inputsignal. For example, an single ink input signal might be processedby both handwriting recognition and gesture recognition. Another isthe use of more than one recording device for the same input(multiple microphones).

The optionalref attribute indicates alocation where a copy of the content within theemma:one-of element can be retrieved from an externaldocument, possibly located on a remote server.

In order to make explicit these different kinds of multipleinterpretations and allow for concise statement of the annotationsassociated with each, theemma:one-of element MAYappear within anotheremma:one-of element. Ifemma:one-of elements are nested then they MUSTindicate the kind of disjunction using the attributedisjunction-type. The values ofdisjunction-type are{recognition,understanding, multi-device, and multi-process}. For themost common use case, where there are multiple recognition resultsand some of them have multiple interpretations, the top-levelemma:one-of isdisjunction-type="recognition" and the embeddedemma:one-of has the attributedisjunction-type="understanding".

As an example, in an interactive flight reservation application,recognition yielded 'Boston' or 'Austin' and each had a semanticinterpretation as either the assertion of city name or thespecification of a flight query with the city as the destination,this would be represented as follows in EMMA:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:one-of disjunction-type="recognition"      start="12457990" end="12457995"emma:medium="acoustic" emma:mode="voice">     <emma:one-of disjunction-type="understanding"         emma:tokens="boston">       <emma:interpretation>          <assert><city>boston</city></assert>       </emma:interpretation>       <emma:interpretation>          <flight><dest><city>boston</city></dest></flight>       </emma:interpretation>     </emma:one-of>     <emma:one-of disjunction-type="understanding"         emma:tokens="austin">       <emma:interpretation>          <assert><city>austin</city></assert>       </emma:interpretation>       <emma:interpretation>          <flight><dest><city>austin</city></dest></flight>       </emma:interpretation>     </emma:one-of>  </emma:one-of></emma:emma>

EMMA MAY explicitly represent ambiguity resulting from differentprocesses, devices, or sources using embeddedemma:one-of and thedisjunction-typeattribute. Multiple different interpretations resulting fromdifferent factors MAY also be listed within a single unstructuredemma:one-of though in this case it is more complex orimpossible to uncover the sources of the ambiguity if required bylater stages of processing. If there is no embedding inemma:one-of, then thedisjunction-typeattribute is not required. If thedisjunction-typeattribute is missing then by default the source of disjunction isunspecified.

The example case above could also be represented as:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:one-of  start="12457990" end="12457995"        emma:medium="acoustic" emma:mode="voice">     <emma:interpretation emma:tokens="boston">        <assert><city>boston</city></assert>     </emma:interpretation>     <emma:interpretation >        <flight><dest><city>boston</city></dest></flight>     </emma:interpretation>     <emma:interpretation emma:tokens="austin">        <assert><city>austin</city></assert>     </emma:interpretation>     <emma:interpretation emma:tokens="austin">        <flight><dest><city>austin</city></dest></flight>     </emma:interpretation>  </emma:one-of></emma:emma>

But in this case information about which interpretationsresulted from speech recognition and which resulted from languageunderstanding is lost.

A list ofemma:interpretation elements within anemma:one-of MUST be sorted best-first by some measureof quality. The quality measure isemma:confidence ifpresent, otherwise, the quality metric is platform-specific.

With embeddedemma:one-of structures there is norequirement for the confidence scores within differentemma:one-of to be on the same scale. For example, thescores assigned by handwriting recognition might not be comparableto those assigned by gesture recognition. Similarly, if multiplerecognizers are used there is no guarantee that their confidencescores will be comparable. For this reason the ordering requirementonemma:interpretation withinemma:one-ofonly applies locally to sisteremma:interpretationelements within eachemma:one-of. There is norequirement on the ordering of embeddedemma:one-ofelements within a higheremma:one-of element.

Whileemma:medium andemma:mode areoptional onemma:one-of, note that all EMMAinterpretations must be annotated foremma:medium andemma:mode, so either these annotations must appeardirectly on all of the containedemma:interpretationelements within theemma:one-of, or they must appearon theemma:one-of element itself, or they must appearon an ancestoremma:one-of element, or they mustappear on an earlier stage of the derivation listed inemma:derivation.

An important use case forref onemma:one-of is to allow an EMMA processor to return anabbreviated list of container elements such asemma:interpretation within anemma:one-ofand use theref attribute to provide a reference to amore fully specified set. In these cases, theemma:one-of MUST be annotated with theemma:partial-content="true" attribute.

In the following example the EMMA document receivedhas the two interpretations withinemma:one-of. Theemma:partial-content="true" provides an indicationthat there are more interpretations and those can be retrieved byaccessing the URI inref :"http://www.example.com/emma_021210_10.xml#r1".

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:one-ofemma:medium="acoustic" emma:mode="voice"ref="http://www.example.com/emma_021210_10.xml#r1
emma:partial-content="true"> <emma:interpretation emma:tokens="from boston to denver" emma:confidence="0.9"> <origin>Boston</origin> <destination>Denver</destination> </emma:interpretation> <emma:interpretation emma:tokens="from austin to denver" emma:confidence="0.7"> <origin>Austin</origin> <destination>Denver</destination> </emma:interpretation> </emma:one-of></emma:emma>

Where the document at"http://www.example.com/emma_021210_10.xml" is as follows, andthere are two more interpretations within theemma:one-of with id "r1".

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma    http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:one-ofemma:medium="acoustic" emma:mode="voice"
emma:partial-content="false"> <emma:interpretation emma:tokens="from boston to denver" emma:confidence="0.9"> <origin>Boston</origin> <destination>Denver</destination> </emma:interpretation> <emma:interpretation emma:tokens="from austin to denver" emma:confidence="0.7"> <origin>Austin</origin> <destination>Denver</destination> </emma:interpretation> <emma:interpretation emma:tokens="from tustin to denver" emma:confidence="0.3"> <origin>Tustin</origin> <destination>Denver</destination> </emma:interpretation> <emma:interpretation emma:tokens="from tustin to dallas" emma:confidence="0.1"> <origin>Tustin</origin> <destination>Dallas</destination> </emma:interpretation> </emma:one-of></emma:emma>

It is also possible to specify a lattice of resultsalongside an N-best list of interpretations inemma:one-of. A singleemma:latticeelement can appear as a child ofemma:one-of andcontains a lattice representation of the processing of the sameinput resulting in the interpretations that appear within theemma:one-of. In this example, there are two N-bestresults and theemma:lattice enumerates two more as itincludes arcs for "tomorrow" vs "today".

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:one-of emma:medium="acoustic" emma:mode="voice">    <emma:interpretation emma:tokens="flights from boston to denver tomorrow">      <origin>Boston</origin>      <destination>Denver</destination>      <date>tomorrow</date>    </emma:interpretation>    <emma:interpretation emma:tokens="flights from austin to denver tomorrow">      <origin>Austin</origin>      <destination>Denver</destination>      <date>tomorrow</date>    </emma:interpretation>     <emma:lattice initial="1" final="7">      <emma:arc from="1" to="2">flights</emma:arc>      <emma:arc from="2" to="3">from</emma:arc>      <emma:arc from="3" to="4">boston</emma:arc>      <emma:arc from="3" to="4">austin</emma:arc>      <emma:arc from="4" to="5">to</emma:arc>      <emma:arc from="5" to="6">denver</emma:arc>      <emma:arc from="6" to="7">today</emma:arc>      <emma:arc from="6" to="7">tomorrow</emma:arc>    </emma:lattice>  </emma:one-of></emma:emma>

3.3.2emma:group element

Annotationemma:group
DefinitionA container element indicating that a number of interpretationsof distinct user inputs are grouped according to somecriteria.
ChildrenTheemma:group element MUST immediately contain acollection of one or moreemma:interpretation elementsor container elements:emma:one-of,emma:group,emma:sequence . It MAY alsocontain anoptional singleemma:group-info element. It MAY also containmultiple optionalemma:derived-fromelements and multipleemma:infoelements.It MAY also containmultiple optionalemma:annotation elements. It MAYalso contain multiple optionalemma:parameterselements. It MAY also contain a single optionalemma:grammar-active element.It may also contain a singleemma:locationelement.
Attributes
  • Required: Attributeid of typexsd:ID
  • Optional:
  • A singleref attribute of typexsd:anyURI providing a reference to a location wherethe content of the element can be retrieved from
  • Anemma:partial-content attribute oftypexsd:boolean indicating whether the content insidethe element is partial and more can be retrieved from an externaldocument throughref
  • The annotation attributes:emma:tokens,emma:process,emma:lang,emma:signal,emma:signal-size,emma:media-type,emma:confidence,emma:source,emma:start,emma:end,emma:time-ref-uri,emma:time-ref-anchor-point,emma:offset-to-start,emma:duration,emma:medium,emma:mode,emma:function,emma:verbal,emma:cost,emma:grammar-ref,emma:endpoint-info-ref,emma:model-ref,emma:dialog-turn,emma:info-ref,emma:parameter-ref,emma:process-model-ref,emma:annotated-tokens,emma:result-format,emma:expressed-through,emma:device-type.
Applies toTheemma:group element is legal only as a child ofemma:emma,emma:one-of,emma:group,emma:sequence, oremma:derivation.

Theemma:group element is used to indicate that thecontained interpretations are from distinct user inputs that arerelated in some manner.emma:group MUST NOT be usedfor containing the multiple stages of processing of a single userinput. Those MUST be contained in theemma:derivationelement instead(Section4.1.2). For groups of inputs in temporal order the morespecialized containeremma:sequence MUST be used(Section3.3.3). The following example shows threeinterpretations derived from the speech input "Move this ambulancehere" and the tactile input related to two consecutive points on amap.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:group      emma:start="1087995961542"      emma:end="1087995964542">    <emma:interpretationemma:medium="acoustic" emma:mode="voice">      <action>move</action>      <object>ambulance</object>      <destination>here</destination>    </emma:interpretation>    <emma:interpretationemma:medium="tactile" emma:mode="ink">      <x>0.253</x>      <y>0.124</y>    </emma:interpretation>    <emma:interpretationemma:medium="tactile" emma:mode="ink">      <x>0.866</x>      <y>0.724</y>    </emma:interpretation>  </emma:group></emma:emma>

Theemma:one-of andemma:groupcontainers MAY be nested arbitrarily.

Likeemma:one-of the contents foremma:group may be partial, indicated byemma:partial-content="true" and the full set of groupmembers retrieved by accessing the element referenced inref.

3.3.2.1 Indirect grouping criteria:emma:group-info element

Annotationemma:group-info
DefinitionTheemma:group-info element contains or referencescriteria used in establishing the grouping of interpretations in anemma:group element.
ChildrenTheemma:group-info element MUST eitherimmediately contain inline instance data specifying groupingcriteria or have the attributeref referencing thecriteria.
Attributes
  • Optional:ref of typexsd:anyURI referencing the grouping criteria;alternatively the criteria MAY be provided inline as the content oftheemma:group-info element.
Applies toTheemma:group-info element is legal only as achild ofemma:group.

Sometimes it may be convenient to indirectly associate a givengroup with information, such as grouping criteria. Theemma:group-info element might be used to make explicitthe criteria by which members of a group are associated. In thefollowing example, a group of two points is associated with adescription of grouping criteria based upon a sliding temporalwindow of two seconds duration.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example"    xmlns:ex="http://www.example.com/ns/group">  <emma:group>    <emma:group-info>      <ex:mode>temporal</ex:mode>      <ex:duration>2s</ex:duration>    </emma:group-info>    <emma:interpretation     emma:medium="tactile" emma:mode="ink">      <x>0.253</x>      <y>0.124</y>    </emma:interpretation>    <emma:interpretationemma:medium="tactile" emma:mode="ink">      <x>0.866</x>      <y>0.724</y>    </emma:interpretation>  </emma:group></emma:emma>

You might also useemma:group-info to refer to anamed grouping criterion using external reference, forinstance:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example"    xmlns:ex="http://www.example.com/ns/group">  <emma:group>    <emma:group-info ref="http://www.example.com/criterion42"/>    <emma:interpretationemma:medium="tactile" emma:mode="ink">      <x>0.253</x>      <y>0.124</y>    </emma:interpretation>    <emma:interpretationemma:medium="tactile" emma:mode="ink">      <x>0.866</x>      <y>0.724</y>    </emma:interpretation>  </emma:group></emma:emma>

3.3.3emma:sequence element

Annotationemma:sequence
DefinitionA container element indicating that a number of interpretationsof distinct user inputs are in temporal sequence.
ChildrenTheemma:sequence element MUST immediately containa collection of one or moreemma:interpretationelements or container elements:emma:one-of,emma:group,emma:sequence . It MAY alsocontainmultiple optionalemma:derived-from elements and multipleemma:infoelements.ItMAY also contain multiple optionalemma:annotationelements. It MAY also contain multiple optionalemma:parameters elements. It MAY also contain a singleoptionalemma:grammar-active element.It may also contain a singleemma:locationelement.
Attributes
  • Required: Attributeid of typexsd:ID
  • Optional:
  • A singleref attribute of typexsd:anyURI providing a reference to a location wherethe content of the element can be retrieved from
  • Anemma:partial-content attribute oftypexsd:boolean indicating whether the content insidethe element is partial and more can be retrieved from the serverthroughref
  • The annotation attributes:emma:tokens,emma:process,emma:lang,emma:signal,emma:signal-size,emma:media-type,emma:confidence,emma:source,emma:start,emma:end,emma:time-ref-uri,emma:time-ref-anchor-point,emma:offset-to-start,emma:duration,emma:medium,emma:mode,emma:function,emma:verbal,emma:cost,emma:grammar-ref,emma:endpoint-info-ref,emma:model-ref,emma:dialog-turn,emma:info-ref,emma:parameter-ref,emma:process-model-ref,emma:annotated-tokens,emma:result-format,emma:expressed-through,emma:device-type.
Applies toTheemma:sequence element is legal only as a childofemma:emma,emma:one-of,emma:group,emma:sequence, oremma:derivation.

Theemma:sequence element is used to indicate thatthe contained interpretations are sequential in time, as in thefollowing example, which indicates that two points made with a penare in temporal order.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:sequence>    <emma:interpretationemma:medium="tactile" emma:mode="ink">      <x>0.253</x>      <y>0.124</y>    </emma:interpretation>    <emma:interpretationemma:medium="tactile" emma:mode="ink">      <x>0.866</x>      <y>0.724</y>    </emma:interpretation>  </emma:sequence></emma:emma>

Theemma:sequence container MAY be combined withemma:one-of andemma:group in arbitrarynesting structures. The order of children in the content of theemma:sequence element corresponds to a sequence ofinterpretations. This ordering does not imply any particulardefinition of sequentiality. EMMA processors are expected thereforeto use theemma:sequence element to holdinterpretations which are either strictly sequential in nature(e.g. the end-time of an interpretation precedes the start-time ofits follower), or which overlap in some manner (e.g. the start-timeof a follower interpretation precedes the end-time of itsprecedent). It is possible to use timestamps to provide finegrained annotation for the sequence of interpretations that aresequential in time(seeSection4.2.10).

In the following more complex example, a sequence of two pengestures inemma:sequence and a speech input inemma:interpretationis contained in anemma:group.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:group>     <emma:interpretation emma:medium="acoustic"         emma:mode="voice">       <action>move</action>       <object>this-battleship</object>       <destination>here</destination>     </emma:interpretation>     <emma:sequence>       <emma:interpretation emma:medium="tactile"           emma:mode="ink">         <x>0.253</x>         <y>0.124</y>       </emma:interpretation>     <emma:interpretation emma:medium="tactile"         emma:mode="ink">       <x>0.866</x>       <y>0.724</y>     </emma:interpretation>   </emma:sequence> </emma:group></emma:emma>

Likeemma:one-of the contents for emma:group may bepartial, indicated byemma:partial-content="true" andthe full set of group members retrieved by accessing the elementreferenced inref.

3.4 Lattice element

In addition to providing the ability to represent N-best listsof interpretations usingemma:one-of, EMMA alsoprovides the capability to represent lattices of words or othersymbols using theemma:lattice element. Latticesprovide a compact representation of large lists of possiblerecognition results or interpretations for speech, pen, ormultimodal inputs.

In addition to providing a representation for lattice outputfrom speech recognition, another important use case for lattices isfor representation of the results of gesture and handwritingrecognition from a pen modality component. Lattices can also beused to compactly represent multiple possible meaningrepresentations. Another use case for the lattice representation isfor associating confidence scores and other annotations withindividual words within a speech recognition result string.

Lattices are compactly described by a list of transitionsbetween nodes. For each transition the start and end nodes MUST bedefined, along with the label for the transition. Initial and finalnodes MUST also be indicated. The following figure provides agraphical representation of a speech recognition lattice whichcompactly represents eight different sequences of words.

speech lattice

which expands to:

a. flights to boston from portland today pleaseb. flights to austin from portland today pleasec. flights to boston from oakland today pleased. flights to austin from oakland today pleasee. flights to boston from portland tomorrowf. flights to austin from portland tomorrowg. flights to boston from oakland tomorrowh. flights to austin from oakland tomorrow

3.4.1 Lattice markup:emma:lattice,emma:arc,emma:node elements

Annotationemma:lattice
DefinitionAn element which encodes a lattice representation of userinput.
ChildrenTheemma:lattice element MUST immediately containone or moreemma:arc elements and zero or moreemma:node elements.
Attributes
  • Required:
    • initialof typexsd:nonNegativeInteger indicating the number ofthe initial node of the lattice.
    • final contains a space-separated list ofxsd:nonNegativeInteger indicating the numbers of thefinal nodes in the lattice.
  • Optional:
    • idof typexsd:id
    • A singleref attribute of typexsd:anyURI providing a reference to a location wherethe content of the lattice element can be retrieved from
    • Anemma:partial-content attribute oftypexsd:boolean indicating whether the content insidethe element is partial and more can be retrieved from an externaldocument throughref
    • emma:time-ref-uri,emma:time-ref-anchor-point.
Applies toTheemma:lattice element is legal only as a childof theemma:interpretationandemma:one-of elements.
Annotationemma:arc
DefinitionAn element which encodes a transition between two nodes in alattice. The label associated with the arc in the lattice isrepresented in the content ofemma:arc.
ChildrenTheemma:arc element MUST immediately containeither character data or a single application namespace element orbe empty, in the case of epsilon transitions. It MAY contain anemma:info element containing application or vendorspecific annotations.It MAY contain zero or more optionalemma:annotation elements containing annotations madeby a human annotator.
Attributes
  • Required:
    • fromof typexsd:nonNegativeInteger indicating the number ofthe starting node for the arc.
    • toof typexsd:nonNegativeInteger indicating the number ofthe ending node for the arc.
  • Optional:emma:start,emma:end,emma:offset-to-start,emma:duration,emma:confidence,emma:cost,emma:lang,emma:medium,emma:mode,emma:source,emma:annotated-tokens.
Applies toTheemma:arc element is legal only as a child oftheemma:lattice element.
Annotationemma:node
DefinitionAn element which represents a node in the lattice. Theemma:node elements are not required to describe alattice but might be added to provide a location for annotations onnodes in a lattice. There MUST be at most oneemma:node specification for each numbered node in thelattice.
ChildrenAn OPTIONALemma:info element for application orvendor specific annotations on the node.It MAY contain zeroor more optionalemma:annotation elements containingannotations made by a human annotator.
Attributes
  • Required:
    • node-numberof typexsd:nonNegativeInteger indicating thenode number in the lattice.
  • Optional:emma:confidence,emma:cost.
Applies toTheemma:node element is legal only as a child oftheemma:lattice element.

In EMMA, a lattice is represented using an elementemma:lattice, which has attributesinitial andfinal for indicating theinitial and final nodes of the lattice. For the latticebelow, this will be:<emma:latticeinitial="1" final="8"/>. The nodes are numbered withintegers. If there is more than one distinct final node in thelattice the nodes MUST be represented as a space separated list inthe value of thefinal attribute e.g.<emma:lattice initial="1" final="9 10 23"/>.There MUST only be one initial node in an EMMA lattice. Eachtransition in the lattice is represented as an elementemma:arc with attributesfrom andto which indicate the nodes where the transitionstarts and ends. The arc's label is represented as the content oftheemma:arc element and MUST be any well-formedcharacter or XML content. In the example here the contents arewords. Empty (epsilon) transitions in a lattice MUST be representedin theemma:lattice representation asemma:arcempty elements, e.g.<emma:arc from="1" to="8"/>.

The example speech lattice above would be represented in EMMAmarkup as follows:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretationemma:medium="acoustic" emma:mode="voice">    <emma:lattice initial="1" final="8">      <emma:arc from="1" to="2">flights</emma:arc>      <emma:arc from="2" to="3">to</emma:arc>      <emma:arc from="3" to="4">boston</emma:arc>      <emma:arc from="3" to="4">austin</emma:arc>      <emma:arc from="4" to="5">from</emma:arc>      <emma:arc from="5" to="6">portland</emma:arc>      <emma:arc from="5" to="6">oakland</emma:arc>      <emma:arc from="6" to="7">today</emma:arc>      <emma:arc from="7" to="8">please</emma:arc>      <emma:arc from="6" to="8">tomorrow</emma:arc>    </emma:lattice>  </emma:interpretation></emma:emma>

Alternatively, if we wish to represent the same information asan N-best list usingemma:one-of, we would have themore verbose representation:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:one-ofemma:medium="acoustic" emma:mode="voice">    <emma:interpretation>      <text>flights to boston from portland today please</text>    </emma:interpretation>    <emma:interpretationid="interp2">      <text>flights to boston from portland tomorrow</text>    </emma:interpretation>    <emma:interpretation>      <text>flights to austin from portland today please</text>    </emma:interpretation>    <emma:interpretation>      <text>flights to austin from portland tomorrow</text>    </emma:interpretation>    <emma:interpretation>      <text>flights to boston from oakland today please</text>    </emma:interpretation>    <emma:interpretation>      <text>flights to boston from oakland tomorrow</text>    </emma:interpretation>    <emma:interpretation>      <text>flights to austin from oakland today please</text>    </emma:interpretation>    <emma:interpretation>      <text>flights to austin from oakland tomorrow</text>    </emma:interpretation>  </emma:one-of></emma:emma>

The lattice representation avoids the need to enumerate all ofthe possible word sequences. Also, as detailed below, theemma:lattice representation enables placement ofannotations on individual words in the input.

For use cases involving the representation of gesture/inklattices and use cases involving lattices of semanticinterpretations, EMMA allows for application namespace elements toappear withinemma:arc.

For example a sequence of two gestures, each of which isrecognized as either a line or a circle, might berepresented as follows:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretationemma:medium="acoustic" emma:mode="voice">    <emma:lattice initial="1" final="3">      <emma:arc from="1" to="2">        <circle radius="100"/>      </emma:arc>      <emma:arc from="2" to="3">        <line length="628"/>      </emma:arc>      <emma:arc from="1" to="2">        <circle radius="200"/>      </emma:arc>      <emma:arc from="2" to="3">        <line length="1256"/>      </emma:arc>    </emma:lattice>  </emma:interpretation></emma:emma>

As an example of a lattice of semantic interpretations, in atravel application where the source is either "Boston" or"Austin"and the destination is either "Newark" or "New York", thepossibilities might be represented in a lattice as follows:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretationemma:medium="acoustic" emma:mode="voice">    <emma:lattice initial="1" final="3">      <emma:arc from="1" to="2">        <source city="boston"/>      </emma:arc>      <emma:arc from="2" to="3">        <destination city="newark"/>      </emma:arc>      <emma:arc from="1" to="2">        <source city="austin"/>      </emma:arc>      <emma:arc from="2" to="3">        <destination city="new york"/>      </emma:arc>    </emma:lattice>  </emma:interpretation></emma:emma>

Theemma:arc element MAY contain either anapplication namespace element or character data. It MUST NOTcontain combinations of application namespace elements andcharacter data. However, anemma:info element MAYappear within anemma:arc element alongside characterdata, in order to allow for the association of vendor orapplication specific annotations on a single word or symbol in alattice. Also anemma:annotation element may appear asa child ofemma:arc oremma:nodeindicating human annotations on the arc or node.

So, in summary, there are four groupings of content that canappear withinemma:arc:

Theref attribute onemma:lattice canbe used for cases where the lattice is not returned in thedocument, but is made accessible throughref, or forcases where the lattice is partial and a full lattice is availableon the server.

For example the followingemma:lattice does notcontain anyemma:arc elements butrefindicates where the lattice can retrieved from.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation    emma:medium="acoustic" emma:mode="voice"    emma:tokens="flights to boston from oakland tomorrow">    <emma:lattice initial="1" final="8"      emma:partial-content="true"       ref="http://www.example.com/ex1/lattice.xml#l1"/>  </emma:interpretation></emma:emma>
The document on the server in this case could forexample be as follows.
<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation    emma:medium="acoustic" emma:mode="voice"    emma:tokens="flights to boston from oakland tomorrow">    <emma:lattice initial="1" final="8"      emma:partial-content="false">      <emma:arc from="1" to="2">flights</emma:arc>      <emma:arc from="2" to="3">to</emma:arc>      <emma:arc from="3" to="4">boston</emma:arc>      <emma:arc from="3" to="4">austin</emma:arc>      <emma:arc from="4" to="5">from</emma:arc>      <emma:arc from="5" to="6">portland</emma:arc>      <emma:arc from="5" to="6">oakland</emma:arc>      <emma:arc from="6" to="7">today</emma:arc>      <emma:arc from="7" to="8">please</emma:arc>      <emma:arc from="6" to="8">tomorrow</emma:arc>    </emma:lattice>  </emma:interpretation></emma:emma>

Similarly theemma:lattice could have some arcs butnot all and point to throughref to the full lattice.In this case the EMMA document received is a pruned lattice and thefull lattice can be retrieved by accessing the external documentindicated inref.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation    emma:medium="acoustic" emma:mode="voice">    <emma:lattice initial="1" final="8"      emma:partial-content="true"      ref="http://www.example.com/ex1/lattice.xml#l1">      <emma:arc from="1" to="2">flights</emma:arc>      <emma:arc from="2" to="3">to</emma:arc>      <emma:arc from="3" to="4">boston</emma:arc>      <emma:arc from="4" to="5">from</emma:arc>      <emma:arc from="5" to="6">portland</emma:arc>      <emma:arc from="6" to="8">tomorrow</emma:arc>    </emma:lattice>  </emma:interpretation></emma:emma>

3.4.2 Annotations on lattices

The encoding of lattice arcs as XML elements(emma:arc) enables arcs to be annotated with metadatasuch as timestamps, costs, or confidence scores:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretationemma:medium="acoustic" emma:mode="voice">    <emma:lattice initial="1" final="8">      <emma:arc       from="1"       to="2"       emma:start="1087995961542"       emma:end="1087995962042"       emma:cost="30">         flights<emma:annotation        annotator="john_smith"            time="2011-11-10T09:00:21"            type="emotion"            confidence="1.0"            reference="false">            <emotionml xmlns="http://www.w3.org/2009/10/emotionml">                <emotion>                <category set="everyday" name="angry"/>
<modality medium="acoustic" mode="voice"/> </emotion> </emotionml> </emma:annotation>
</emma:arc> <emma:arc from="2" to="3" emma:start="1087995962042" emma:end="1087995962542" emma:cost="20"> to </emma:arc> <emma:arc from="3" to="4" emma:start="1087995962542" emma:end="1087995963042" emma:cost="50"> boston </emma:arc> <emma:arc from="3" to="4" emma:start="1087995963042" emma:end="1087995963742" emma:cost="60"> austin </emma:arc> ... </emma:lattice> </emma:interpretation></emma:emma>

The following EMMA attributes MAY be placed onemma:arc elements: absolute timestamps(emma:start,emma:end), relativetimestamps (emma:offset-to-start,emma:duration),emma:confidence,emma:cost, the human language of the input(emma:lang),emma:medium,emma:mode,emma:source,andemma:annotated-tokens. The use case foremma:medium,emma:mode, andemma:source is for lattices which contains contentfrom different input modes. Theemma:arc element MAYalso contain anemma:info element for specification ofvendor and application specific annotations on the arc.Theemma:arc andemma:node elements can alsocontain optionalemma:annotation elements containingannotations mae by human annotators. For example, in the exampleaboveemma:annotation is used to indicate manualannotation of emotion on the word 'flights'.

The timestamps that appear onemma:arc elements donot necessarily indicate the start and end of the arc itself. TheyMAY indicate the start and end of the signal corresponding to thelabel on the arc. As a result there is no requirement that theemma:end timestamp on an arc going into a node shouldbe equivalent to theemma:start of all arcs going outof that node. Furthermore there is no guarantee that the left toright order of arcs in a lattice will correspond to the temporalorder of the input signal. The lattice representation is anabstraction that represents a range of possible interpretations ofa user's input and is not intended to necessarily be arepresentation of temporal order.

Costs are typically application and device dependent. There area variety of ways that individual arc costs might be combined toproduce costs for specific paths through the lattice. Thisspecification does not standardize the way for these costs to becombined; it is up to the applications and devices to determine howsuch derived costs would be computed and used.

For some lattice formats, it is also desirable to annotate thenodes in the lattice themselves with information such as costs. Forexample in speech recognition, costs might be placed on nodes as aresult of word penalties or redistribution of costs. For thispurpose EMMA also provides anemma:node element whichcan host annotations such asemma:cost. Theemma:node element MUST have an attributenode-number which indicates the number of the node.There MUST be at most oneemma:node specification fora given numbered node in the lattice. In our example, if there wasa cost of100 on the final state this could be representedas follows:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretationemma:medium="acoustic" emma:mode="voice">    <emma:lattice initial="1" final="8">      <emma:arc       from="1"       to="2"       emma:start="1087995961542"       emma:end="1087995962042"       emma:cost="30">         flights      </emma:arc>      <emma:arc       from="2"       to="3"       emma:start="1087995962042"       emma:end="1087995962542"       emma:cost="20">         to      </emma:arc>      <emma:arc       from="3"       to="4"       emma:start="1087995962542"       emma:end="1087995963042"       emma:cost="50">         boston      </emma:arc>      <emma:arc       from="3"       to="4"       emma:start="1087995963042"       emma:end="1087995963742"       emma:cost="60">         austin      </emma:arc>        ...      <emma:node node-number="8" emma:cost="100"/>    </emma:lattice>  </emma:interpretation></emma:emma>

3.4.3 Relative timestamps on lattices

The relative timestamp mechanism in EMMA is intended to providetemporal information about arcs in a lattice in relative termsusing offsets in milliseconds. In order to do this the absolutetime MAY be specified onemma:interpretation; bothemma:time-ref-uri andemma:time-ref-anchor-point apply toemma:lattice and MAY be used there to set the anchorpoint for offsets to the start of the absolute time specified onemma:interpretation. The offset in milliseconds to thebeginning of each arc MAY then be indicated on eachemma:arc in theemma:offset-to-startattribute.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation          emma:start="1087995961542" emma:end="1087995963042"emma:medium="acoustic" emma:mode="voice">    <emma:lattice emma:time-ref-uri="#interp1"        emma:time-ref-anchor-point="start"        initial="1" final="4">      <emma:arc       from="1"       to="2"       emma:offset-to-start="0">         flights      </emma:arc>      <emma:arc       from="2"       to="3"       emma:offset-to-start="500">         to      </emma:arc>      <emma:arc       from="3"       to="4"       emma:offset-to-start="1000">         boston      </emma:arc>    </emma:lattice>  </emma:interpretation></emma:emma>

Note that the offset for the firstemma:arc MUSTalways be zero since the EMMA attributeemma:offset-to-start indicates the number ofmilliseconds from the anchor point to thestart of the pieceof input associated with theemma:arc, in this casethe word "flights".

3.5 Literal semantics:emma:literalelement

Annotationemma:literal
DefinitionAn element that contains string literal output.
ChildrenString literal
AttributesAn optionalemma:result-format attribute.
Applies toTheemma:literal is a child ofemma:interpretation.

Certain EMMA processing components produce semantic results inthe form of string literals without any surrounding applicationnamespace markup. These MUST be placed with the EMMA elementemma:literal withinemma:interpretation.For example, if a semantic interpreter simply returned "boston"this could be represented in EMMA as:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretationid="r1"      emma:medium="acoustic" emma:mode="voice">    <emma:literal>boston</emma:literal>  </emma:interpretation></emma:emma>

Note that a raw recognition result of a sequence of words fromspeech recognition is also a kind of string literal and can becontained withinemma:literal. For example,recognition of the string "flights to san francisco" can berepresented in EMMA as follows:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretationid="r1"      emma:medium="acoustic" emma:mode="voice">    <emma:literal>flights to san francisco</emma:literal>  </emma:interpretation></emma:emma>

4. EMMA annotations

This section defines annotations in the EMMA namespace includingboth attributes and elements. The values are specified in terms ofthe data types defined by XML Schema Part 2: DatatypesSecondEdition [XMLSchema Datatypes].

4.1 EMMA annotation elements

4.1.1 Data model:emma:modelelement

Annotationemma:model
DefinitionTheemma:model either references or providesinline the data model for the instance data.
ChildrenIf aref attribute is not specified then thiselement contains the data model inline.
Attributes
  • Required:
    • id of typexsd:ID.
  • Optional:
    • ref of typexsd:anyURI thatreferences the data model. Note that either anrefattribute or in-line data model (but not both) MUST bespecified.
Applies toTheemma:model element MAY appear only as a childofemma:emma.

The data model that may be used to express constraints on thestructure and content of instance data is specified as one of theannotations of the instance. Specifying the data model is OPTIONAL,in which case the data model can be said to be implicit. Typicallythe data model is pre-established by the application.

The data model is specified with theemma:modelannotation defined as an element in the EMMA namespace. If the datamodel for the contents of aemma:interpretation,container elements, or application namespace element is to bespecified in EMMA, the attributeemma:model-ref MUSTbe specified on theemma:interpretation, containerelement, or application namespace element. Note that since multipleemma:model elements might be specified under theemma:emma it is possible to refer to multiple datamodels within a single EMMA document. For example, differentalternative interpretations under anemma:one-of mighthave different data models. In this case, anemma:model-ref attribute would appear on eachemma:interpretation element in the N-best list withits value being theid of theemma:modelelement for that particular interpretation.

The data model is closely related to the interpretation data,and is typically specified as the annotation related to theemma:interpretation oremma:one-ofelements.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:model ref="http://example.com/models/city.xml"/>  <emma:interpretation emma:model-ref="model1"emma:medium="acoustic" emma:mode="voice">    <city> London </city>    <country> UK </country>  </emma:interpretation></emma:emma>

Theemma:model annotation MAY reference any elementor attribute in the application instance data, as well as any EMMAcontainer element (emma:one-of,emma:group, oremma:sequence).

The data model annotation MAY be used to either reference anexternal data model with theref attribute or providea data model as in-line content. Either arefattribute or in-line data model (but not both) MUST bespecified.

Note that unlike the use ofrefon e.g.emma:one-of it is not possible in EMMA toprovide a partial specification of the data model inline and useemma:partial-content="true" to indicate that the fulldata model is available from the URI inref.

4.1.2 Interpretation derivation:emma:derived-from element andemma:derivation element

Annotationemma:derived-from
DefinitionAn empty element which provides a reference to theinterpretation which the element it appears on was derivedfrom.
ChildrenNone
Attributes
  • Required:
    • resource of typexsd:anyURI thatreferences the interpretation from which the current interpretationis derived.
  • Optional:
    • composite of typexsd:boolean that is"true" if the derivation step combines multiple inputsand"false" if not. Ifcomposite is notspecified the value is"false" by default.
Applies toTheemma:derived-from element is legal only as achild ofemma:interpretation,emma:one-of,emma:group, oremma:sequence.
Annotationemma:derivation
DefinitionAn element which contains interpretation and container elementsrepresenting earlier stages in the processing of the input.
ChildrenOne or moreemma:interpretation,emma:one-of,emma:sequence, oremma:group elements.
AttributesNone
Applies toTheemma:derivation MAY appear only as a child oftheemma:emma,emma:interpretation,emma:one-of,emma:group, andemma:sequence elements.

Instances of interpretations are in general derived from otherinstances of interpretation in a process that goes from raw data toincreasingly refined representations of the input. The derivationannotation is used to link any two interpretations that are relatedby representing the source and the outcome of an interpretationprocess. For instance, a speech recognition process can return thefollowing result in the form of raw text:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretationemma:medium="acoustic" emma:mode="voice">    <answer>From Boston to Denver tomorrow</answer>  </emma:interpretation></emma:emma>

A first interpretation process will produce:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretationemma:medium="acoustic" emma:mode="voice">    <origin>Boston</origin>    <destination>Denver</destination>    <date>tomorrow</date>  </emma:interpretation></emma:emma>

A second interpretation process, aware of the current date, willbe able to produce a more refined instance, such as:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretationemma:medium="acoustic" emma:mode="voice">    <origin>Boston</origin>    <destination>Denver</destination>    <date>20030315</date>  </emma:interpretation></emma:emma>

The interaction manager might need to have access to the threelevels of interpretation. Theemma:derived-fromannotation element can be used to establish a chain of derivationrelationships as in the following example:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:derivation>    <emma:interpretation     emma:medium="acoustic" emma:mode="voice">      <answer>From Boston to Denver tomorrow</answer>    </emma:interpretation>    <emma:interpretation>      <emma:derived-from resource="#raw" composite="false"/>      <origin>Boston</origin>      <destination>Denver</destination>      <date>tomorrow</date>    </emma:interpretation>  </emma:derivation>  <emma:interpretation>    <emma:derived-from resource="#better" composite="false"/>    <origin>Boston</origin>    <destination>Denver</destination>    <date>20030315</date>  </emma:interpretation></emma:emma>

Theemma:derivation element MAY be used as acontainer for representations of the earlier stages in theinterpretation of the input. Theemma:derivationelement MAY appear only as a child of theemma:emma,emma:interpretation,emma:one-of,emma:group,emma:sequence elements. Thatis, it can be a child ofemma:emma, or any containerelement except literal or lattice. Ifemma:derivationappears within a container it MUST apply to that specific element,or to a descendant of that element. The latest stage of processingMUST be a direct child ofemma:emma.

The resource attribute onemma:derived-from is aURI which can reference IDs in the current or other EMMA documents.Sinceemma:derivation elements can appear in multipledifferent places, EMMA processors MUST use theemma:derived-from element to identify earlier stagesof the processing of an input, rather than the document structure.The option to haveemma:derivation in locations otherthan directly underemma:emma is provided to make thedocument more transparent and improve human readability.

In the following example,emma:sequence is used torepresent a sequence of two spoken inputs and each has its ownemma:derivation element containing the previous stageof processing.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:sequence>    <emma:interpretation>      <emma:derived-from resource="#raw1" composite="false"/>      <origin>Boston</origin>      <emma:derivation>         <emma:interpretation
emma:medium="acoustic" emma:mode="voice"> <emma:literal>flights from boston</emma:literal> </emma:interpretation> </emma:derivation> </emma:interpretation> <emma:interpretation> <emma:derived-from resource="#raw2" composite="false"/> <destination>Denver</destination> <emma:derivation> <emma:interpretation
emma:medium="acoustic" emma:mode="voice"> <emma:literal>to denver</emma:literal> </emma:interpretation> </emma:derivation> </emma:interpretation> </emma:sequence></emma:emma>

In addition to representing sequential derivations, the EMMAemma:derived-from element can also be used to capturecomposite derivations. Composite derivations involve combination ofinputs from different modes.

In order to indicate whether anemma:derived-fromelement describes a sequential derivation step or a compositederivation step, theemma:derived-from element has anattributecomposite which has a boolean value. Acompositeemma:derived-from MUST be marked ascomposite="true" while a sequentialemma:derived-from element is marked ascomposite="false". If this attribute is not specifiedthe value isfalse by default.

In the following composite derivation example the user said"destination" using the voice mode and circled Boston on a mapusing the ink mode:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:derivation>    <emma:interpretation        emma:start="1087995961500"        emma:end="1087995962542"        emma:process="http://example.com/myasr.xml"        emma:source="http://example.com/microphone/NC-61"        emma:signal="http://example.com/signals/sg23.wav"        emma:confidence="0.6"        emma:medium="acoustic"        emma:mode="voice"        emma:function="dialog"        emma:verbal="true"        emma:lang="en-US"        emma:tokens="destination">      <rawinput>destination</rawinput>    </emma:interpretation>    <emma:interpretation        emma:start="1087995961600"        emma:end="1087995964000"        emma:process="http://example.com/mygesturereco.xml"        emma:source="http://example.com/pen/wacom123"        emma:signal="http://example.com/signals/ink5.inkml"        emma:confidence="0.5"        emma:medium="tactile"        emma:mode="ink"        emma:function="dialog"        emma:verbal="false">      <rawinput>Boston</rawinput>    </emma:interpretation>  </emma:derivation>  <emma:interpretation      emma:confidence="0.3"emma:start="1087995961500"emma:end="1087995964000"      emma:medium="acoustic tactile"      emma:mode="voice ink"      emma:function="dialog"      emma:verbal="true"      emma:lang="en-US"      emma:tokens="destination">    <emma:derived-from resource="#voice1" composite="true"    <emma:derived-from resource="#ink1" composite="true"    <destination>Boston</destination>  </emma:interpretation></emma:emma>

In this example, annotations on the multimodal interpretationindicate the process used for the integration and there are twoemma:derived-from elements, one pointing to the speechand one pointing to the pen gesture.

The only constraints the EMMA specification places on theannotations that appear on a composite input are that theemma:medium attribute MUST contain the union of theemma:medium attributes on the combining inputs,represented as a space delimited set ofnmtokens asdefined inSection4.2.11, and that theemma:mode attribute MUSTcontain the union of theemma:mode attributes on thecombining inputs, represented as a space delimited set ofnmtokens as defined inSection4.2.11. In the example above this meanings that theemma:medium value is"acoustic tactile"and theemma:mode attribute is"voiceink". How all other annotations are handled is authordefined. In the following paragraph, informative examples on howspecific annotations might be handled are given.

With reference to the illustrative example above, this paragraphprovides informative guidance regarding the determination ofannotations (beyondemma:medium andemma:mode on a composite multimodal interpretation).Generally the timestamp on a combined input should contain theintervals indicated by the combining inputs. For the absolutetimestampsemma:start andemma:end thiscan be achieved by taking the earlier of theemma:start values(emma:start="1087995961500" in our example) and thelater of theemma:end values(emma:end="1087995964000" in the example). Thedetermination of relative timestamps for composite is more complex,informative guidance is given inSection4.2.10.4. Generally speaking theemma:confidencevalue will be some numerical combination of the confidence scoresassigned to the combining inputs. In our example, it is the resultof multiplying the voice and ink confidence scores(0.3). In other cases there may not be a confidencescore for one of the combining inputs and the author may choose tocopy the confidence score from the input which does have one.Generally, foremma:verbal, if either of the inputshas the valuetrue then the multimodal interpretationwill also beemma:verbal="true" as in the example. Inother words the annotation for the composite input is the result ofan inclusive OR of the boolean values of the annotations on theinputs. If an annotation is only specified on one of the combininginputs then it may in some cases be assumed to apply to themultimodal interpretation of the composite input. In the example,emma:lang="en-US" is only specified for the speechinput, and this annotation appears on the composite result also.Similarly in our example, only the voice hasemma:tokens and the author has chosen to annotate thecombined input with the sameemma:tokens value. Inthis example, theemma:function is the same on bothcombining input and the author has chosen to use the sameannotation on the composite interpretation.

In annotating derivations of the processing of the input, EMMAprovides the flexibility of both course-grained or fine-grainedannotation of relations among interpretations. For example, whenrelating two N-best lists, withinemma:one-of elementseither there can be a singleemma:derived-from elementunderemma:one-of referring to the ID of theemma:one-of for the earlier processing stage:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:derivation>    <emma:one-ofemma:medium="acoustic" emma:mode="voice">      <emma:interpretation>       <res>from boston to denver on march eleven two thousand three</res>      </emma:interpretation>      <emma:interpretation>       <res>from austin to denver on march eleven two thousand three</res>      </emma:interpretation>  </emma:one-of></emma:derivation><emma:one-of>  <emma:derived-from resource="#nbest1" composite="false"/>  <emma:interpretation>    <origin>Boston</origin>    <destination>Denver</destination>    <date>03112003</date>  </emma:interpretation>  <emma:interpretation>    <origin>Austin</origin>    <destination>Denver</destination>    <date>03112003</date>  </emma:interpretation></emma:one-of></emma:emma>

Or there can be a separateemma:derived-fromelement on eachemma:interpretation element referringto the specificemma:interpretation element it wasderived from.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:one-of>    <emma:interpretation>     <emma:derived-from resource="#int1" composite="false"/>      <origin>Boston</origin>      <destination>Denver</destination>      <date>03112003</date>    </emma:interpretation>    <emma:interpretation>     <emma:derived-from resource="#int2" composite="false"/>      <origin>Austin</origin>      <destination>Denver</destination>      <date>03112003</date>    </emma:interpretation>  </emma:one-of>  <emma:derivation>    <emma:one-ofemma:medium="acoustic" emma:mode="voice">      <emma:interpretation>       <res>from boston to denver on march eleven two thousand three</res>      </emma:interpretation>      <emma:interpretation>       <res>from austin to denver on march eleven two thousand three</res>      </emma:interpretation>    </emma:one-of>  </emma:derivation></emma:emma>

Section4.3 provides further examples of the use ofemma:derived-from to represent sequential derivationsand addresses the issue of the scope of EMMA annotations acrossderivations of user input.

4.1.3 Reference to grammar used:emma:grammar element

Annotationemma:grammar
DefinitionAn element used indicate the grammar used in processing theinput.The grammar MUST either be specified inline ORreferenced using theref attribute.
ChildrenIn the case of inline specification of the grammar, thiselement contains an element with the specification of thegrammar.
Attributes
  • Optional:
    • grammar-type of typexsd:string that indicate the MIME type of thegrammar
    • ref of typexsd:anyURIthat references a grammar used in processing the input.
  • Required:
    • id of typexsd:ID.
Applies toTheemma:grammar is legal only as a child of theemma:emma element.

The grammar that was used to derive the EMMA result MAY bespecified with theemma:grammar annotation defined asan element in the EMMA namespace. Theemma:grammar-refattribute appears on the specific interpretation and references theappropriateemma:grammar element.Theemma:grammar element MUST either contain arepresentation of the grammar inline OR have arefattribute which contains a URI referencing the grammar used inprocessing the input. The optional attributegrammar-type onemma:grammar contains aMIME type indicating the format of the specified grammar. Forexample an SRGS grammar in the XML format SHOULD be annotated asgrammar-type="application/srgs-xml". The namespace ofan inline grammar MUST be specified.

In the following example, there are three interpretations. Eachinterpretation is annotated withemma:grammar-ref toindicate the grammar that resulted in that interpretation. The twoemma:grammar elements indicate the URI for the grammarusing theref attribute. Both grammars are SRGS XMLgrammars and so are annotated asgrammar-type="application/srgs-xml".

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:grammar grammar-type="application/srgs-xml"ref="someURI"/>  <emma:grammargrammar-type="application/srgs-xml" ref="anotherURI"/>  <emma:one-ofemma:medium="acoustic" emma:mode="voice">    <emma:interpretation emma:grammar-ref="gram1">      <origin>Boston</origin>    </emma:interpretation>    <emma:interpretation emma:grammar-ref="gram1">        <origin>Austin</origin>    </emma:interpretation>    <emma:interpretation emma:grammar-ref="gram2">        <command>help</command>    </emma:interpretation>  </emma:one-of></emma:emma>

In the following example, there are two interpretations, eachfrom a different grammar, and the SRGS grammars used to derive theinterpretations are specified inline each as a child of anemma:grammar element. The namespace of the inlinegrammars is indicated explicitly on each.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:grammar grammar-type="application/srgs-xml">
<grammar xmlns="http://www.w3.org/2001/06/grammar" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd" xml:lang="en" version="1.1" root="state" mode="voice" tag-format="semantics/1.0">
<rule scope="public"> <one-of> <item>California<tag>out="CA";</tag></item> <item>New Jersey<tag>out="NJ";</tag></item> <item>New York<tag>out="NY";</tag></item> </one-of> </rule>
</grammar> </emma:grammar> <emma:grammar grammar-type="application/srgs-xml">
<grammar xmlns="http://www.w3.org/2001/06/grammar" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd" xml:lang="en" version="1.1" root="city" mode="voice" tag-format="semantics/1.0">
<rule scope="public"> <one-of> <item>Calgary<tag>out="YYC";</tag></item> <item>San Francisco<tag>out="SFO";</tag></item> <item>Boston<tag>out="BOS";</tag></item> </one-of> </rule>
</grammar> </emma:grammar> <emma:one-ofemma:medium="acoustic" emma:mode="voice"> <emma:interpretation emma:tokens="California" emma:grammar-ref="gram1"> <emma:literal>CA</emma:literal> </emma:interpretation> <emma:interpretation emma:tokens="Calgary" emma:grammar-ref="gram2"> <emma:literal>YYC</emma:literal> </emma:interpretation> </emma:one-of></emma:emma>

Non-XML grammar formats, such as the SRGS ABNF format, MUST becontained within<!-[CDATA[ ...]]>. Care shouldbe taken in platforms generating EMMA to avoid conflicts betweenid values in the EMMA markup and those in any inlinegrammars. Authors should be aware that there could be conflictsbetweenid values used in different embedded inlinegrammars within an EMMA document.

Note that unlike the use ofrefon e.g.emma:one-of it is not possible in EMMA toprovide a partial specification of the grammar inline and useemma:partial-content="true" to indicate that the fullgrammar is available from the URI inref.

4.1.4 Reference to grammars active:emma:grammar-active element

Annotationemma:grammar-active
DefinitionAn element used to indicate the grammars active during theprocessing of an input.
ChildrenA list ofemma:active elements, one for eachgrammar currently active.
Attributes
  • Required:
    • id of typexsd:ID.
Applies toemma:interpretation,emma:one-of,emma:group,emma:sequence
Annotationemma:active
DefinitionAn element specifying a particular grammar active during theprocessing of an input.
ChildrenNone
Attributes
  • Required:
    • grammar-ref of typexsd:ID.
Applies toemma:grammar-active

The default when multipleemma:grammar elements arespecified underemma:emma is to assume that allgrammars are active for all of the interpretations specified in thetop level of the current EMMA document. In certain use cases, suchas documents containing results from different microphones ordifferent modalities, this may not be the case and the set ofgrammars active for a specific interpretation or set ofinterpretations should be annotated explicitly usingemma:grammar-active. Each grammar which is active isindicated by an active element which must have anemma:grammar-ref annotation pointing to the specificgrammar. For example, to make explicit the fact that both grammars,gram1 andgram2 are active for all of thethree N-best interpretations in the following example, anemma:grammar-active element appears as a child of theemma:one-of.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:grammar grammar-type="application/srgs-xml"ref="someURI"/>  <emma:grammargrammar-type="application/srgs-xml" ref="anotherURI"/>  <emma:one-ofemma:medium="acoustic" emma:mode="voice">     <emma:grammar-active>        <emma:active emma:grammar-ref="gram1"/>      <emma:active emma:grammar-ref="gram2"/>    </emma:grammar-active>    <emma:interpretation emma:grammar-ref="gram1">      <origin>Boston</origin>    </emma:interpretation>    <emma:interpretation emma:grammar-ref="gram1">        <origin>Austin</origin>    </emma:interpretation>    <emma:interpretation emma:grammar-ref="gram2">        <command>help</command>    </emma:interpretation>  </emma:one-of></emma:emma>

The use of an element for each active grammar, allows for morecomplex use cases where specific metadata is associated with eachactive grammar. For example, a weighting of other parametersassociated with each active grammar could be specified within anemma:info withinemma:active.

4.1.5 Extensibility to application/vendor specificannotations:emma:info element

Annotationemma:info
DefinitionTheemma:info element acts as a container forvendor and/or application specific metadata regarding a user'sinput.
ChildrenOne of more elements in the application namespaceproviding metadata about the input.
Attributes
  • Optional:
    • id of typexsd:ID
    • ref of typexsd:anyURI that references a remote documentcontaining a specification of application/vendor specificannotations
    • Anemma:partial-contentattribute of typexsd:boolean indicating whether thecontent inside the element is partial and more can be retrievedfrom an external document throughref
    • indexed of typexsd:booleanindicating whether it has index scope
Applies toTheemma:info element is legal only as a child ofthe EMMA elementsemma:emma,emma:interpretation,emma:group,emma:one-of,emma:sequence,emma:arc,emma:node,oremma:annotation.

InSection4.2, a series of attributes are defined for representation ofmetadata about user inputs in a standardized form. EMMA alsoprovides an extensibility mechanism for annotation of user inputswith vendor or application specific metadata not covered by thestandard set of EMMA annotations. The elementemma:info MUST be used as a container for theseannotations, UNLESS they are explicitly covered byemma:endpoint-info. For example, if an input to adialog system needed to be annotated with the number that the calloriginated from, their state, some indication of the type ofcustomer, and the name of the service, these pieces of informationcould be represented withinemma:info as in thefollowing example:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:info>    <caller_id>      <phone_number>2121234567</phone_number>      <state>NY</state>    </caller_id>    <customer_type>residential</customer_type>    <service_name>acme_travel_service</service_name>  </emma:info>  <emma:one-of
emma:start="1087995961542" emma:end="1087995963542"emma:medium="acoustic" emma:mode="voice"> <emma:interpretation emma:confidence="0.75"> <origin>Boston</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> <emma:interpretation emma:confidence="0.68"> <origin>Austin</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> </emma:one-of></emma:emma>

It is important to have an EMMA container element forapplication/vendor specific annotations since EMMA elements providea structure for representation of multiple possible interpretationsof the input. As a result it is cumbersome to stateapplication/vendor specific metadata as part of the applicationdata within eachemma:interpretation. An element isused rather than an attribute so that internal structure can begiven to the annotations withinemma:info.

In addition toemma:emma,emma:infoMAY also appear as a child of other structural elements such asemma:interpretation,emma:one-of and soon. Whenemma:info appears as a child of one of theseelements the application/vendor specific annotations containedwithinemma:info are assumed to apply to all of theemma:interpretation elements within the containingelement. The semantics of conflicting annotations inemma:info, for example when different values are foundwithinemma:emma andemma:interpretation,are left to the developer of the vendor/application specificannotations.

There may be more than oneemma:info element. Oneof the functions of this is to enable specification interpretationsto indicate whichemma:info applies to them usingemma:info-ref. Ifemma:infohas the optionalid attribute then theemma:info-ref attribute (Section4.2.19) can be used onemma:interpretation andother container elements to indicate that a particular set ofapplication/vendor specific annotations apply to a particularinterpretation.

Theemma:info element can therefore have eitherposition scope (applies to the element it appears in and theinterpretations within in), or index scope whereemma:info-ref attributes are used to show whichinterpretations a particularemma:info applies to. Inorder to distinguish emma:info elements that have positional vs.index scope the indexed attribute must be used. The attributeindexed=true indicates that theemma:infoit appears on does not have positional scope and instead isreferenced usingemma:info-ref. The attributeindexed=false indicates than anemma:infohas positional scope. The default value ifindexed isnot specified isfalse. Theindexedattribute is required if and only if there is anemma:info-ref that refers to theid oftheemma:info.

Theref attribute can also be used onemma:info instead of placing the application/vendorspecific annotations inline. For example, assuming the exampleabove was available athttp://example.com/examples/123/emma.xml, the EMMAdocument delivered to an EMMA consumer could be:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:info ref="http://example.com/examples/123/emma.xml#info_details"/>  <emma:one-of
emma:start="1087995961542" emma:end="1087995963542" emma:medium="acoustic" emma:mode="voice"> <emma:interpretation emma:confidence="0.75"> <origin>Boston</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> <emma:interpretation emma:confidence="0.68"> <origin>Austin</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> </emma:one-of></emma:emma>

Aref onemma:info canalso be used to point to an external document, not necessarily anEMMA document, containing additional annotations on theinterpretation. For example, it could be used to point to an XMLdocument providing a list of the specifications of the inputdevice.

4.1.6 Endpoint reference:emma:endpoint-info element andemma:endpoint element

Annotationemma:endpoint-info
DefinitionTheemma:endpoint-info element acts as a containerfor all application specific annotation regarding the communicationenvironment.
ChildrenOne or moreemma:endpoint elements.
Attributes
  • Required:
    • id of typexsd:ID.
Applies toTheemma:endpoint-info elements is legal only as achild ofemma:emma.
Annotationemma:endpoint
DefinitionThe element acts as a container for application specificendpoint information.
ChildrenElements in the application namespace providing metadata aboutthe input.
Attributes
  • Required:
    • id of typexsd:ID
  • Optional:emma:endpoint-role,emma:endpoint-address,emma:message-id,emma:port-num,emma:port-type,emma:endpoint-pair-ref,emma:service-name,emma:media-type,emma:medium,emma:mode.
Applies toemma:endpoint-info

In order to conduct multimodal interaction, there is a need inEMMA to specify the properties of the endpoint that receives theinput which leads to the EMMA annotation. This allows subsequentcomponents to utilize the endpoint properties as well as theannotated inputs to conduct meaningful multimodal interaction. EMMAelementemma:endpoint can be used for this purpose. Itcan specify the endpoint properties based on a set of commonendpoint property attributes in EMMA, such asemma:endpoint-address,emma:port-num,emma:port-type, etc. (Section4.2.14). Moreover, it provides an extensible annotationstructure that allows the inclusion of application and vendorspecific endpoint properties.

Note that the usage of the term "endpoint" in this context isdifferent from the way that the term is used in speech processing,where it refers to the end of a speech input. As used here,"endpoint" refers to a network location which is the source orrecipient of an EMMA document.

In multimodal interaction, multiple devices can be used and eachdevice can open multiple communication endpoints at the same time.These endpoints are used to transmit and receive data, such as rawinput, EMMA documents, etc. The EMMA elementemma:endpoint provides a generic representation ofendpoint information which is relevant to multimodal interaction.It allows the annotation to be interoperable, and it eliminates theneed for EMMA processors to create their own specializedannotations for existing protocols, potential protocols or yetundefined private protocols that they may use.

Moreover,emma:endpoint-info provides a containerto hold all annotations regarding the endpoint information,includingemma:endpoint and other application andvendor specific annotations that are related to the communication,allowing the same communication environment to be referenced andused in multiple interpretations.

Note that EMMA provides two locations (i.e.emma:info andemma:endpoint-info) forspecifying vendor/application specific annotations. If theannotation is specifically related to the description of theendpoint, then the vendor/application specific annotation SHOULD beplaced withinemma:endpoint-info, otherwise it SHOULDbe placed withinemma:info.

The following example illustrates the annotation of endpointreference properties in EMMA.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example"    xmlns:ex="http://www.example.com/emma/port">  <emma:endpoint-info>    <emma:endpoint        emma:endpoint-role="sink"        emma:endpoint-address="135.61.71.103"        emma:port-num="50204"        emma:port-type="rtp"        emma:endpoint-pair-ref="endpoint2"        emma:media-type="audio/dsr-202212; rate:8000; maxptime:40"        emma:service-name="travel"        emma:mode="voice">      <ex:app-protocol>SIP</ex:app-protocol>    </emma:endpoint>    <emma:endpoint        emma:endpoint-role="source"        emma:endpoint-address="136.62.72.104"        emma:port-num="50204"        emma:port-type="rtp"        emma:endpoint-pair-ref="endpoint1"        emma:media-type="audio/dsr-202212; rate:8000; maxptime:40"        emma:service-name="travel"        emma:mode="voice">      <ex:app-protocol>SIP</ex:app-protocol>    </emma:endpoint>  </emma:endpoint-info>  <emma:interpretation      emma:start="1087995961542" emma:end="1087995963542"      emma:endpoint-info-ref="audio-channel-1"emma:medium="acoustic" emma:mode="voice">    <destination>Chicago</destination>  </emma:interpretation></emma:emma>

Theex:app-protocol is provided by the applicationor the vendor specification. It specifies that the applicationlayer protocol used to establish the speech transmission from the"source" port to the "sink" port is Session Initiation Protocol(SIP). This is specific to SIP based VoIP communication, in whichthe actual media transmission and the call signaling that controlsthe communication sessions, are separated and typically based ondifferent protocols. In the above example, the Real-timeTransmission Protocol (RTP) is used in the media transmissionbetween the source port and the sink port.

4.1.7 Reference to process model used:emma:process-model element

Annotationemma:process-model
DefinitionAn element used indicate the model used in processing theinput. The model must be referenced using therefattribute which is URI valued.
ChildrenNone.
Attributes
  • Required:
    • id of typexsd:ID.
    • ref of typexsd:anyURI thatreferences a model used in processing the input.
    • type of typexsd:string thatindicates the type of model
Applies toTheemma:process-model is legal only as a child oftheemma:emma element.

The model that was used to derive the EMMA result MAY bespecified with theemma:process-model annotationdefined as an element in the EMMA namespace. Theemma:process-model-ref attribute appears on thespecific interpretation and references the appropriateemma:process-model element. Theemma:process-model element MUST have aref attribute which contains a URI referencing themodel used in processing the input. Unlikeemma:grammar,emma:process-model does notallow for inline specification of a model. For eachemma:process-model element there MUST be anemma:process-model-ref in the document those value istheid of thatemma:process-model. Theemma:process-model element cannot have positionalscope.

Theemma:process-model element MUST have anattributetype containing a stringindicating the type of model referenced. The value of type is drawnfrom an open set including{svm,crf,neural_network,hmm...}.

Examples of potential uses ofemma:process-modelinclude referencing the model used for handwriting recognition or atext classification model used for natural language understanding.Theemma:process-model annotation SHOULD be used forinput processing models that are not grammars. Grammars SHOULD bereferenced or specified inline usingemma:grammar.Some input processing modules may utilize both a recognition modeland a grammar. For example, for handwriting recognition ofelectronic ink a neural network might be used for characterrecognition while a language model or grammar is used to constrainthe word or character sequences recognized. In this case, theneural network SHOULD be referenced usingemma:process-model and the grammar or language modelusingemma:grammar.

In the following example, there are two interpretations. TheEMMA document in this example is produced by a computer visionsystem doing object recognition. The first interpretation isgenerated by a process model for vehicle recognition and secondcompeting interpretation is generated by a process model for personrecognition.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma    http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:process-model    type="neural_network"    ref="http://example.com/vision/vehicle"/>  <emma:process-model    type="neural_network"    ref="http://example.com/vision/people"/>  <emma:one-of    emma:start="1087995961542"    emma:end="1087995961542"    emma:medium="visual"     emma:mode="image"    emma:process="http://example.com/mycompvision1.xml">>    <emma:interpretation      emma:confidence="0.9"      emma:process-model-ref="pm1">      <object>aircraft</object>    </emma:interpretation>    <emma:interpretation       emma:confidence="0.1"      emma:process-model-ref="pm2">      <object>person</object>    </emma:interpretation>  </emma:one-of></emma:emma>

4.1.8 Reference to parameters used by a process:emma:parameters andemma:parameterelements

Annotationemma:parameters
DefinitionAn element used indicate a set of parameters used to configurea processor used in producing an EMMA result. 
ChildrenAny number ofemma:parameter elements
Attributes
  • Required:
    • id of typexsd:ID.
  • Optional:
    • api-ref of typexsd:string.
    • ref of typexsd:anyURIthat references a document containing a list of parameters
    • Anemma:partial-contentattribute of typexsd:boolean indicating whether thecontent inside the element is partial and more can be retrievedfrom an external document throughref
Applies toTheemma:parameters MAY appear only as a child oftheemma:emma,emma:interpretation,emma:one-of,emma:group, andemma:sequence elements.
Annotationemma:parameter
DefinitionAn element used indicate a specific parameter in theconfiguration of a processor used in producing an EMMAresult. 
ChildrenNone
Attributes
  • Required:
    • name of typexsd:string.
    • value of typexsd:string.
  • Optional:
    • api-ref of typexsd:string.
Applies toTheemma:parameter is legal only as a child of theemma:parameters element.

A set of parameters that were used to configure the EMMAprocessor that produces an EMMA result MAY be specified with theemma:parameters annotation defined as an element inthe EMMA namespace. Theemma:parameter-ref attribute(Section4.2.21)appears on the specificemma:interpretation or othercontainer element and references the appropriateemma:parameters element. For example, typicalparameters for speech recognition such as confidence thresholds,speed vs. accuracy, timeouts, settings for endpointing etc can beincluded inemma:parameters.

For eachemma:parameters element there MUST be anemma:parameter-ref in the document those value is theid of thatemma:parameters. Theemma:parameters element cannot have positionalscope.

The optional attributeapi-ref onemma:parameter andemma:parameters,specifies the specific API that the name and value of a parameteris drawn from or names and values of the set of parameters aredrawn from. It's value is a string from an open set including:{vxml2.1, vxml2.0, MRCPv2, MRCPv1, html+speech,OpenCV....}. A parametersname andvalue are from the API specified inapi-ref on theemma:parameter element ifpresent. Otherwise, they are from the API specified inapi-ref, if present, on the surroundingemma:parameters element. If theapi-refis not defined on eitheremma:parameter oremma:parameters the API that the name(s) and value(s)are drawn from is undefined.

In the following example, the interpretation is annotated withemma:parameter-ref to indicate the set of processingparameters that resulted in that interpretation. These arecontained within anemma:parameters underemma:emma. The API for the first two parameters isinherited fromemma:parameters and is"vxml2.1". The API for the third parameter is vendorspecific and specified directly inapi-ref on thatemma:parameter element.

<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:parameters api-ref="vxml2.1">
<emma:parameter name="confidencelevel" value=".9"/>
<emma:parameter name="completetimeout" value=".3s"/> <emma:parameter name="word_confusion_network_confidence" value="YES" api-ref="x-acme-recognizer"/>
</emma:parameters>
<emma:interpretation emma:parameter-ref="parameters1" emma:medium="acoustic" emma:mode="voice" emma:process="http://example.com/asr>
<origin>Boston</origin>
</emma:interpretation>
</emma:emma>

Note that in an EMMA document describing a multimodal input or aderivation with multiple steps there may be multiple differentemma:parameters elements specifying the parametersused for each specific mode or processing stage. The relationshipbetween aemma:parameters element and the containerelement it applies to is captured by theemma:parameter-ref attribute.

Instead of specifying parameters inline theref attribute can be used to provide a URI referenceto an external document containing the parameters. This could beeither a pointer to anemma:parameters element withinan EMMA document, or it can be a reference to a non-EMMA documentcontaining a specification of the parameters. In the followingexample, theemma:parameters element contains areference to an separate parameters document.

<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:parameters api-ref="vxml2.1" ref="http://example.com/mobile/asr/params.xml"
</emma:parameters>
<emma:interpretation emma:parameter-ref="parameters1" emma:medium="acoustic" emma:mode="voice" emma:process="http://example.com/asr>
<origin>Boston</origin>
</emma:interpretation>
</emma:emma>

4.1.9 Human annotation:emma:annotation element

Annotationemma:annotation
DefinitionTheemma:annotation element acts as acontainer for annotations of user inputs made by humanlabellers
ChildrenOne or more elements providing annotations of the input.May also contain a singleemma:infoelement.
Attributes
  • Optional:
    • id of typexsd:ID.
    • annotator of typexsd:stringindicating the name or other identifier of theannotator
    • type of typexsd:string fromthe open set {transcription,semantics,emotion ...}
    • time of typexsd:dateTimeindicating the time at which the annotation label wasmade
    • reference of typexsd:booleanindicating if this annotation is the reference for the currentinterpretation
    • emma:confidence an attribute of typexsd:decimal in range 0.0 to 1.0, indicating theannotators confidence in their annotation.
    • srcref anattribute of typexsd:anyURI used to refer to anannotation outside of the document
Applies toTheemma:annotation element is legal only asa child of the EMMA elementsemma:emma,emma:interpretation,emma:group,emma:one-of,emma:sequence,emma:arc, oremma:node.

In many spoken and multimodal applications, at some time afteruser interactions have taken place, human labellers are used toprovide annotation of the input. For example, for speech input themost common annotation is to transcribe the actual words spoken bythe user by listening to the audio. The correct semanticinterpretation of the input may also be annotated. Labellers mayalso annotate other aspects of the input such as the emotionalstate of the user.

To provide support for augmenting logged EMMA documents withhuman annotations, the EMMA markup provides theemma:annotation element. Multiple instances of thiselement can appear as a children of the various EMMA containers. Inexamples withemma:one-of and multipleemma:interpretation elements,emma:annotation will generally appear as a child ofemma:one-of as it is an annotation of the signalrather than of the specific interpretation hypotheses encoded inthe individual interpretations. Theemma:annotationelement can also be used to annotated arcs and states in latticesby including it inemma:arc andemma:node.

In addition toid, theemma:annotationelement provides a series of optional attributes that MAY be usedto provide metadata regarding the annotation. Theannotator attribute contains a string indicating thename or other identifier of the annotator. Thetypeattribute indicates the kind of annotation and has an open set ofvalues{transcription, semantics, emotion ...}. Thetime attribute onemma:annotation doesnot have any relation to the time of the input itself, rather itindicates the date and time that the annotation was made. Theemma:confidence attribute is a value between 0 and 1indicating the annotators confidence in their annotation. Thereference attribute is a boolean which indicateswhether the annotation is appears on is the reference annotationfor the interpretation as opposed to some other annotation of theinput. For example, if the interpretation in the EMMA document is aspeech recognition result, annotation of the reference stringSHOULD havereference="true", while an annotation ofthe emotional state of the user should be annotated asreference="false" Further metadata regarding theannotation can be captured by usingemma:info withinemma:annotation.

In addition to specifying annotations inline thesrcref attribute on theemma:annotation element can be used to refer to anexternal document containing the annotation content.

In the following example, the EMMA document contains an N-bestlist with two recognition hypotheses and their semanticrepresentations. Underemma:one-of there are threedifferent annotations all made by different annotators on differentdays and times. The first is the transcription, this indicates thatin fact neither of the N-best results was correct and actualutterance spoken was "flights from austin to denver tomorrow". Thesecond annotation (label2) contains the annotatedsemantic interpretation of the reference string. The thirdannotation contains an additional piece of metadata captured by ahuman labeller, specifically it captures the fact that based on theaudio, the user's emotional state was angry. Here as anillustration we utilizeEmotionML markup.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:one-of emma:start="1087995961542"      emma:end="1087995963542"emma:medium="acoustic"       emma:mode="voice"      emma:function="dialog"      emma:verbal="true"      emma:signal="http://example.com/signals/audio457.wav">    <emma:interpretation emma:confidence="0.75"      emma:tokens="flights from boston to denver tomorrow">      <origin>Boston</origin>      <destination>Denver</destination>      <date>tomorrow</date>    </emma:interpretation>    <emma:interpretation emma:confidence="0.68"      emma:tokens="flights from austin to denver today">      <origin>Austin</origin>      <destination>Denver</destination>      <date>today</date>    </emma:interpretation><emma:annotation      annotator="joe_bloggs"      time="2011-10-26T21:32:52"       type="transcription"      emma:confidence="0.9"      reference="false">      <emma:literal>flights from austin to denver tomorrow</emma:literal>    </emma:annotation>    <emma:annotation      annotator="mary_smith"      time="2011-10-27T12:00:21"        type="semantics"      emma:confidence="1.0"      reference="true">       <origin>Austin</origin>      <destination>Denver</destination>      <date>tomorrow</date>    </emma:annotation>    <emma:annotation      annotator="tim_black"      time="2011-11-10T09:00:21"       type="emotion"      emma:confidence="1.0"      reference="false">      <emotionml xmlns="http://www.w3.org/2009/10/emotionml">        <emotion>           <category set="everyday" name="angry"/>
<modality medium="acoustic" mode="voice"/> </emotion> </emotionml> </emma:annotation>
</emma:one-of></emma:emma>

In addition to this more powerful mechanism for adding humanannotation to a document, EMMA also provides a shorthandemma:annotated-tokens attribute for the common usecase of adding reference transcriptions to an EMMA document (Section4.2.22) .

Note that 'annotation' as used in theemma:annotation element and theemma:annotated-tokens attribute refers only toannotations made in a post process by human labellers to indicatewhat the correct processing (reference) of an input should havebeen or to annotate other aspects of the input. This differs fromthe general sense of annotation as used more broadly in thespecification as in the title "Extensible MultiModal Annotation",which refers in general to metadata provided about an input eitherby an EMMA processor or by a human labeller. The many annotationelements and attributes in EMMA are used to indicate metadatacaptured regarding an input. Theemma:annotationelement andemma:annotated-tokens attribute arespecifically for the addition of information provided by humanlabellers.

Annotations such theEmotionML in the example above can also be stored in separate filesand referenced on anemma:annotation element usingref. Likeemma:parameters, a partialspecification of the annotation can be provided inline andemma:partial-content="true" provides an indicationthat the full annotation can be accessed atref.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:one-of emma:start="1087995961542"      emma:end="1087995963542"      emma:medium="acoustic"       emma:mode="voice"      emma:function="dialog"      emma:verbal="true"      emma:signal="http://example.com/signals/audio457.wav"      emma:confidence="0.75">    <emma:interpretation emma:confidence="0.75"      emma:tokens="flights from boston to denver tomorrow">      <origin>Boston</origin>      <destination>Denver</destination>      <date>tomorrow</date>    </emma:interpretation>   <emma:annotation      annotator="tim_black"      time="2011-11-10T09:00:21"        type="emotion"      confidence="1.0"      reference="false"      ref="http://example.com/2011/11/10/emotion123.xml">    </emma:annotation>  </emma:one-of></emma:emma>

4.1.10 Location:emma:locationelement

Annotationemma:location
DefinitionTheemma:location element acts as acontainer for information about the location of a user input, moreprecisely, information about the location of a capture device suchas a mobile device.
Childrennone
Attributes
  • Required:id of typexsd:ID.
  • Optional:
    • emma:latitude of typexsd:float inthe range -90 to 90, indicating the latitude of the capturedevice
    • emma:longitude of typexsd:float inthe range -180 to 180, indicating the longitude of the capturedevice
    • emma:accuracy of typexsd:float,indicating the accuracy of the position in meters, it MUST benon-negative
    • emma:altitude of typexsd:float,indicating the height of the capture device in meters
    • emma:altitudeAccuracy of typexsd:float, indicating the accuracy of the altitudeinformation. This value MUST be non-negative.
    • emma:heading of typexsd:float, inthe range 0 ≤ heading < 360, indicating the direction in whichthe device is moving. The value is empty if the capture device isstationary.
    • emma:speed of typexsd:float inmeters/second, indicating the speed at which the capture device ismoving. This value MUST be non-negative. The value is the emptystring if the device is stationary.
    • emma:description of typexsd:string,providing a description of the location of the device at thebeginning of capture of the signal
    • emma:address of typexsd:string,providing the address of the device at the beginning ofcapture
Applies toTheemma:location element is legal only as achild of the EMMA elementsemma:emma,emma:interpretation,emma:group,emma:one-of,emma:sequence.

Many mobile devices and sensors are equiped withgeolocation capabilities and information about where a unimodal ormultimodal event occurred can be very useful both forinterpretation and logging. Annotating interpretations withlocation information in EMMA is achieved with theemma:location element. Theemma:locationelement indicates the location of the capture device. In many casesthe device location and the user location will be identical, as inthe case where the user is carrying a mobile device. In other usecases (e.g. cameras capturing distant motion, far field microphonearrays) the user may be distant from from the device location.Capturing the location of the user or other source of signal isbeyond the scope of theemma:location annotation. Notethatemma:location is not intended as a generalsemantic representation for location information, e.g. a gesturemade a location on a map or a spoken location, these rather arepart of the interpretation and should be contained withinemma:interpretation rather than theemma:location annotation element. The locationinformation inemma:location represents a point inspace. Since a device or sensor may be moving during the capture ofan input, the location may not be same at the beginning and end ofan input. For this reason, theemma:locationinformation is defined to be relative to the beginning of thecapture. Note though that the bearing of the sensor can beannotated using theemma:heading andemma:speed attributes onemma:location.Theemma:location element represents the location ofsingle capture device. Uses cases where multiple input devices orsensors are involved in the capture of the input can be representedas composite inputs with anemma:location elementannotation on each of the interpretations that are composed.Multimodal Interaction Working Group invites comments on use casesthat may require a finer-grained representation of locationmetadata.

Theemma:location attributes are basedon the W3C Geolocation API[Geolocation]specification, with the addition of attributes for a description ofthe location and address information. The formats of the attributesfrom the Geolocation API are as defined in that specification.Specifically, they are:

The geographic coordinate reference system used bythe attributes is the World Geodetic System (2d)[WGS84].No other reference system is supported.

Theemma:latitude andemma:longitude attributes are geographic coordinatesof the capture device at the beginning of the capture. They MUST bespecified in decimal degrees.

Theemma:altitude attribute denotes theheight of the position at the beginning of the capture. It MUST bespecified in meters above the[WGS84]ellipsoid, or as provided by the device's geolocationimplementation. If the implementation cannot provide altitudeinformation, the value of this attribute MUST be the emptystring.

Theemma:accuracy attribute denotes theaccuracy of the latitude and longitude coordinates. It MUST bespecified in meters. The value of theemma:accuracyattribute MUST be a non-negative real number.

Theemma:altitudeAccuracy attribute isspecified in meters. If the implementation cannot provide altitudeinformation, the value of this attribute MUST be the empty string.Otherwise, the value of theemma:altitudeAccuracyattribute MUST be a non-negative real number.

Theemma:accuracy andemma:altitudeAccuracy values in a EMMA document SHOULDcorrespond to a 95% confidence level.

Theemma:heading attribute denotes thedirection of travel of the capture device at the beginning of thecapture, and is specified in degrees, where 0° ≤ heading < 360°,counting clockwise relative to the true north. If theimplementation cannot provide heading information, the value ofthis attribute MUST be the empty string. If the capture device isstationary (i.e. the value of the speed attribute is 0), then thevalue of theemma:heading attribute MUST be the emptystring.

Theemma:speed attribute denotes themagnitude of the horizontal component of the capture device'svelocity at the beginning of the capture, and MUST be specified inmeters per second. If the implementation cannot provide speedinformation, the value of this attribute MUST be the empty string.Otherwise, the value of theemma:speed attribute MUSTbe a non-negative real number.

Theemma:description attribute is anarbitrary string describing the location of the capture device atthe beginning of the capture.

Theemma:address attribute is anarbitrary string describing the address of the capture device atthe beginning of the capture.

The internal formats of theemma:description and theemma:addressattributes are not defined in this specification.

The following example shows the location informationfor an input spoken at the W3C MIT office.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma    http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">   <emma:location      latitude="42.361860"       longitude="-71.091840"       altitude="6.706"       accuracy="20.5"       altitudeAccuracy="1.6"       heading=""       speed=""       description="W3C MIT office"       address="32 Vassar Street, Cambridge, MA 02139 USA"/>   </emma:location>   <emma:interpretation      emma:medium="acoustic"      emma:mode="voice"       emma:tokens="flights from boston to denver">         <origin>Boston</origin>         <destination>Denver</destination>   </emma:interpretation> </emma:emma>

 

4.2 EMMA annotation attributes

4.2.1 Tokens of input:emma:tokensattribute

Annotationemma:tokens
DefinitionAn attribute of typexsd:string holding a sequenceof input tokens.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence, andapplication instance data.

Theemma:tokens annotation holds a list of inputtokens. In the following description, the termtokens isused in the computational and syntactic sense ofunits ofinput, and not in the sense ofXML tokens. The valueheld inemma:tokens is the list of the tokens of inputas produced by the processor which generated the EMMA document;there is no language associated with this value.

In the case where a grammar is used to constrain input, thevalue will correspond to tokens as defined by the grammar. So foran EMMA document produced by input to a SRGS grammar [SRGS],the value ofemma:tokens will be the list of wordsand/or phrases that are defined as tokens in SRGS (seeSection 2.1of [SRGS]).Items in theemma:tokens list are delimited by whitespace and/or quotation marks for phrases containing white space.For example:

emma:tokens="arriving at 'Liverpool Street'"

where the three tokens of input arearriving,atandLiverpool Street.

Theemma:tokens annotation MAY be applied not justto the lexical words and phrases of language but to any level ofinput processing. Other examples of tokenization include phonemes,ink strokes, gestures and any other discrete units of input at anylevel.

Examples:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation      emma:tokens="From Cambridge to London tomorrow"emma:medium="acoustic" emma:mode="voice">    <origin emma:tokens="From Cambridge">Cambridge</origin>    <destination emma:tokens="to London">London</destination>    <date emma:tokens="tomorrow">20030315</date>  </emma:interpretation></emma:emma>

4.2.2 Reference to processing:emma:process attribute

Annotationemma:process
DefinitionAn attribute of typexsd:anyURI referencing theprocess used to generate the interpretation.
Applies toemma:interpretation,emma:one-of,emma:group,emma:sequence

A reference to the information concerning the processing thatwas used for generating an interpretation MAY be made using theemma:process attribute. For example:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:derivation>    <emma:interpretationemma:medium="acoustic" emma:mode="voice">      <answer>From Boston to Denver tomorrow</answer>    </emma:interpretation>    <emma:interpretation        emma:process="http://example.com/mysemproc1.xml">      <origin>Boston</origin>      <destination>Denver</destination>      <date>tomorrow</date>      <emma:derived-from resource="#raw"/>    </emma:interpretation>  </emma:derivation>  <emma:interpretation      emma:process="http://example.com/mysemproc2.xml">    <origin>Boston</origin>    <destination>Denver</destination>    <date>03152003</date>    <emma:derived-from resource="#better"/>  </emma:interpretation></emma:emma>

The process description document, referenced by theemma:process annotation MAY include information on theprocess itself, such as grammar, type of parser, etc. EMMA is notnormative about the format of the process description document.

Note that while theemma:process attribute mayrefer to a document that describes the process, the URI syntaxitself can be used to briefly describe the process within the EMMAdocument without actually referring to an external document. Forexample, the results of a natural language understanding componentcould be annotated as follows:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation    emma:medium="acoustic"     emma:mode="voice"    emma:tokens="flights from boston to denver tomorrow please"    emma:process="http://nlu/classifier=svm&model=travel&output=xml">      <origin>Boston</origin>      <destination>Denver</destination>      <date>tomorrow</date>  </emma:interpretation></emma:emma>

In this case theemma:process attribute indicatesthat the process is natural language understanding(nlu) that the classifier used is a support vectormachine (svm), that the specific model is the'travel' model and the required output was'xml'. Note that none of the specific values usedwithin the URI here are standardized. This simply illustrates how aURI can be used to provide a detailed process description.

4.2.3 Lack of input:emma:no-inputattribute

Annotationemma:no-input
DefinitionAttribute holdingxsd:boolean value that is trueif there was no input.
Applies toemma:interpretation

The case of lack of input MUST be annotated as follows:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation emma:no-input="true"
emma:medium="acoustic" emma:mode="voice"/></emma:emma>

If theemma:interpretation is annotated withemma:no-input="true" then theemma:interpretation MUST be empty.

4.2.4 Uninterpreted input:emma:uninterpreted attribute

Annotationemma:uninterpreted
DefinitionAttribute holdingxsd:boolean value that is trueifno interpretation was produced in response to theinput
Applies toemma:interpretation

Anemma:interpretation element representing inputfor which no interpretation was produced MUST beannotated withemma:uninterpreted="true". Forexample:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma    http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation emma:uninterpreted="true"emma:medium="acoustic" emma:mode="voice"/></emma:emma>

The notation for uninterpreted input MAY refer to any possiblestage of interpretation processing, including raw transcriptions.For instance, no interpretation would be produced for stagesperforming pure signal capture such as audio recordings. Likewise,if a spoken input was recognized but cannot be parsed by a languageunderstanding component, it can be tagged asemma:uninterpreted as in the following example:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation      emma:process="http://example.com/mynlu.xml"      emma:uninterpreted="true"      emma:tokens="From Cambridge to London tomorrow"emma:medium="acoustic" emma:mode="voice"/></emma:emma>

Theemma:interpretation MUST be emptyif theemma:interpretation elementis annotated withemma:uninterpreted="true".

4.2.5 Human language of input:emma:lang attribute

Annotationemma:lang
DefinitionAn attribute of typexsd:language indicating thelanguage for the input.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence, andapplication instance data.

Theemma:lang annotation is used to indicate thehuman language for the input that it annotates. The values of theemma:lang attribute are language identifiers asdefined byIETF Best Current Practice 47 [BCP47].For example,emma:lang="fr" denotes French, andemma:lang="en-US" denotes US English.emma:lang MAY be applied to anyemma:interpretation element. Its annotative scopefollows the annotative scope of these elements. Unlike thexml:lang attribute in XML,emma:lang doesnot specify the language used by element contents or attributevalues.

The following example shows the use ofemma:langfor annotating an input interpretation.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation emma:lang="fr"emma:medium="acoustic" emma:mode="voice">    <answer>arretez</answer>  </emma:interpretation></emma:emma>

Many kinds of input including some inputs made through pen,computer vision, and other kinds of sensors are inherentlynon-linguistic. Examples include drawing areas, arrows etc. using apen and music input for tune recognition. If these non-linguisticinputs are annotated withemma:lang then they MUST beannotated asemma:lang="zxx". For example, pen inputwhere a user circles an area on map display could be represented asfollows whereemma:lang="zxx" indicates that the inkinput is not in any human language.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation      emma:medium="tactile"      emma:mode="ink"      emma:lang="zxx">    <location>      <type>area</type>      <points>42.1345 -37.128 42.1346 -37.120 ... </points>    </location>  </emma:interpretation></emma:emma>

If inputs for which there is no information about whether thesource input is in a particular human language, and if so whichlanguage, are annotated withemma:lang, then they MUSTbe annotated asemma:lang="". Furthermore, in caseswhere there is not explicitemma:lang annotation, andnone is inherited from a higher element in the document, thedefault value foremma:lang is"" meaningthat there is no information about whether the source input is in alanguage and if so which language.

Thexml:lang andemma:lang attributesserve uniquely different and equally important purposes. The roleof thexml:lang attribute in XML 1.0 is to indicatethe language used for character data content in an XML element ordocument. In contrast, theemma:lang attribute is usedto indicate the language employed by a user when entering an input.Critically,emma:lang annotates the language of thesignal originating from the user rather than the specific tokensused at a particular stage of processing. This is most clearlyillustrated through consideration of an example involving multiplestages of processing of a user input. Consider the followingscenario: EMMA is being used to represent three stages in theprocessing of a spoken input to an system for ordering products.The user input is in Italian, after speech recognition, the userinput is first translated into English, then a natural languageunderstanding system converts the English translation into aproduct ID (which is not in any particular language). Since theinput signal is a user speaking Italian, theemma:langwill beemma:lang="it" on all of these three stages ofprocessing. Thexml:lang attribute, in contrast, willinitially be"it", after translation thexml:lang will be"en-US", and afterlanguage understanding it will be"zxx" since theproduct ID is non-linguistic content. The following are examples ofEMMA documents corresponding to these three processing stages,abbreviated to show the critical attributes for discussion here.Note that<transcription>,<translation>, and<understanding> are application namespaceattributes, not part of the EMMA markup.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">   <emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic">     <transcription xml:lang="it">condizionatore</transcription>   </emma:interpretation></emma:emma>
<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma    http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">    <emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic">        <translation xml:lang="en-US">air conditioner</translation>    </emma:interpretation></emma:emma>
<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma    http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">    <emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic">        <understanding xml:lang="zxx">id1456</understanding>    </emma:interpretation></emma:emma>

In orderto handle inputs involving multiplelanguages, such as through code switching, theemma:lang tag MAY contain several language identifiersseparated by spaces.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation      emma:tokens="please stop arretez s'il vous plait"      emma:lang="en fr"emma:medium="acoustic" emma:mode="voice">    <command> CANCEL </command>  </emma:interpretation></emma:emma>

4.2.6 Reference to signal:emma:signalandemma:signal-size attributes

Annotationemma:signal
DefinitionAn attribute of typexsd:anyURI referencing theinput signal.
Applies toemma:interpretation,emma:one-of,emma:group,emma:sequence,and application instance data.
Annotationemma:signal-size
DefinitionAn attributeof typexsd:nonNegativeIntegerspecifying the size in eight bit octets of the referencedsource.
Applies toemma:interpretation,emma:one-of,emma:group,emma:sequence,and application instance data.

A URI reference to the signal that originated the inputrecognition process MAY be represented in EMMA using theemma:signal annotation.For example, in the caseof speech recognition, theemma:signal attribute isthe annotation used to reference the audio that was recognized. TheMIME type of the audio can be indicated usingemma:media-type.

Here is an example where the reference to a speech signal isrepresented using theemma:signal annotation on theemma:interpretation element:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation      emma:signal="http://example.com/signals/sg23.bin"emma:medium="acoustic" emma:mode="voice">    <origin>Boston</origin>    <destination>Denver</destination>    <date>03152003</date>  </emma:interpretation></emma:emma>

Theemma:signal-size annotation can be used todeclare the exact size of the associated signal in 8-bit octets. Anexample of the use of an EMMA document to represent a recording,withemma:signal-size indicating the size is asfollows:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation      emma:medium="acoustic"      emma:mode="voice"      emma:function="recording"      emma:uninterpreted="true"      emma:signal="http://example.com/signals/recording.mpg"      emma:signal-size="82102"       emma:duration="10000">  </emma:interpretation></emma:emma>

4.2.7 Media type:emma:media-typeattribute

Annotationemma:media-type
DefinitionAn attribute of typexsd:string holding the MIMEtype associated with the signal's data format.
Applies toemma:interpretation,emma:one-of,emma:group,emma:sequence,emma:endpoint,and application instancedata.

The data format of the signal that originated the input MAY berepresented in EMMA using theemma:media-typeannotation. An initial set of MIME media types is defined by [RFC2046].

Here is an example where the media type for the ETSI ES 202 212audio codec for Distributed Speech Recognition (DSR) is applied totheemma:interpretation element. The example alsospecifies an optional sampling rate of 8 kHz and maxptime of 40milliseconds.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation       emma:signal="http://example.com/signals/signal.dsr"        emma:media-type="audio/dsr-es202212; rate:8000; maxptime:40"emma:medium="acoustic" emma:mode="voice">    <origin>Boston</origin>    <destination>Denver</destination>    <date>03152003</date>  </emma:interpretation></emma:emma>

4.2.8 Confidence scores:emma:confidence attribute

Annotationemma:confidence
DefinitionAn attribute of typexsd:decimal in range 0.0 to1.0, indicating the processor's confidence in the result.
Applies toemma:interpretation,emma:one-of,emma:group,emma:sequence,emma:annotation, and application instance data.

The confidence score in EMMA is used to indicatethe quality of the inputtheprocessor or annotator's confidence in the assignment of theinterpretation to the input, and if confidence is annotatedon an input it MUST be given as the value ofemma:confidence. The confidence score MUST be a numberin the range from 0.0 to 1.0 inclusive. A value of 0.0 indicatesminimum confidence, and a value of 1.0 indicates maximumconfidence. Note thatemma:confidence represents notonly the confidence of the speech recognizer, butrathermore generallythe confidence of the whatever processor was responsible forcreating the EMMA result, based on whatever evidence it has. For anatural language interpretation, for example, this might includesemantic heuristics in addition to speech recognition scores.Moreover, the confidence score values do not have to be interpretedas probabilities. In fact confidence score values areplatform-dependent, since their computation is likely to differbetween platforms and different EMMA processors. Confidence scoresare annotated explicitly in EMMA in order to provide thisinformation to the subsequent processes for multimodal interaction.The example below illustrates how confidence scores are annotatedin EMMA.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:one-ofemma:medium="acoustic" emma:mode="voice">    <emma:interpretation emma:confidence="0.6">      <location>Boston</location>    </emma:interpretation>    <emma:interpretation emma:confidence="0.4">      <location> Austin </location>    </emma:interpretation>  </emma:one-of></emma:emma>

In addition to its use as an attribute on the EMMAinterpretation and container elements, theemma:confidence attribute MAY also be used to assignconfidences to elements in instance data in the applicationnamespace. This can be seen in the following example, where the<destination> and<origin>elements have confidences.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation emma:confidence="0.6"emma:medium="acoustic" emma:mode="voice">     <destination emma:confidence="0.8"> Boston</destination>     <origin emma:confidence="0.6"> Austin </origin>  </emma:interpretation></emma:emma>

Although in general instance data can be represented in XMLusing a combination of elements and attributes in the applicationnamespace, EMMA does not provide a standard way to annotateprocessors' confidences in attributes. Consequently, instance datathat is expected to be assigned confidences SHOULD be representedusing elements, as in the above example.

4.2.9 Input source:emma:sourceattribute

Annotationemma:source
DefinitionAn attribute of typexsd:anyURI referencing thesource of input.
Applies toemma:interpretation,emma:one-of,emma:group ,emma:sequence, andapplication instance data.

The source of an interpreted input MAY be represented in EMMA asa URI resource using theemma:source annotation. Hereis an example that shows different input sources for differentinput interpretations.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example"    xmlns:myapp="http://www.example.com/myapp">  <emma:one-ofemma:medium="acoustic" emma:mode="voice">    <emma:interpretation        emma:source="http://example.com/microphone/NC-61">      <myapp:destination>Boston</myapp:destination>    </emma:interpretation>    <emma:interpretation        emma:source="http://example.com/microphone/NC-4024">      <myapp:destination>Austin</myapp:destination>    </emma:interpretation>  </emma:one-of></emma:emma>

4.2.10 Timestamps

The start and end times for input MAY be indicated using eitherabsolute timestamps or relative timestamps. Both are inmilliseconds for ease in processing timestamps. Note that theECMAScript Date object'sgetTime() function is aconvenient way to determine the absolute time.

4.2.10.1 Absolute timestamps:emma:start,emma:end attributes

Annotationemma:start, emma:end
DefinitionAttributesof typexsd:nonNegativeInteger indicating the absolutestarting and ending times of an input in terms of the number ofmilliseconds since 1 January 1970 00:00:00 GMT
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence,emma:arc,and application instancedata

Here is an example of a timestamp for an absolute time.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation       emma:start="1087995961542"       emma:end="1087995963542"emma:medium="acoustic" emma:mode="voice">    <destination>Chicago</destination>  </emma:interpretation></emma:emma>

Theemma:start andemma:endannotations on an input MAY be identical, however theemma:end value MUST NOT be less than theemma:start value.

4.2.10.2 Relative timestamps:emma:time-ref-uri,emma:time-ref-anchor-point,emma:offset-to-start attributes

Annotationemma:time-ref-uri
DefinitionAttribute of typexsd:anyURI indicating the URIused to anchor the relative timestamp.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence,emma:lattice,and application instancedata
Annotationemma:time-ref-anchor-point
DefinitionAttribute with a value ofstart orend, defaulting tostart. It indicateswhether to measure the time from the start or end of the intervaldesignated withemma:time-ref-uri.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence,emma:lattice,and application instancedata
Annotationemma:offset-to-start
DefinitionAttributeof typexsd:integer,defaulting to zero. It specifies the offset in milliseconds for thestart of input from the anchor point designated withemma:time-ref-uri andemma:time-ref-anchor-point
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence,emma:arc,and application instancedata

Relative timestamps define the start of an input relative to thestart or end of a reference interval such as another input.

relative timestamps

The reference interval is designated withemma:time-ref-uri attribute. This MAY be combined withemma:time-ref-anchor-point attribute to specifywhether the anchor point is the start or end of this interval. Thestart of an input relative to this anchor point is then specifiedwithemma:offset-to-start attribute.

Here is an example where the referenced input is in the samedocument:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:sequence>    <emma:interpretationemma:medium="acoustic" emma:mode="voice">    <origin>Denver</origin>    </emma:interpretation>    <emma:interpretationemma:medium="acoustic" emma:mode="voice"        emma:time-ref-uri="#int1"        emma:time-ref-anchor-point="start"        emma:offset-to-start="5000">    <destination>Chicago</destination>    </emma:interpretation>  </emma:sequence></emma:emma>

Note that the reference point refers to an input, but notnecessarily to a complete input. For example, if a speechrecognizer timestamps each word in an utterance, the anchor pointmight refer to the timestamp for just one word.

The absolute and relative timestamps are not mutually exclusive;that is, it is possible to have both relative and absolutetimestamp attributes on the same EMMA container element.

Timestamps of inputs collected by different devices will besubject to variation if the times maintained by the devices are notsynchronized. This concern is outside of the scope of the EMMAspecification.

4.2.10.3 Duration of input:emma:duration attribute

Annotationemma:duration
DefinitionAttributeof typexsd:nonNegativeInteger, defaulting to zero. Itspecifies the duration of the input in milliseconds.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence,emma:arc,and application instancedata

The duration of an input in milliseconds MAY be specified withtheemma:duration attribute. Theemma:duration attribute MAY be used either incombination with timestamps or independently, for example in theannotation of speech corpora.

In the following example, the duration of the signal that gaverise to the interpretation is indicated usingemma:duration.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">    <emma:interpretation emma:duration="2300"emma:medium="acoustic" emma:mode="voice">    <origin>Denver</origin>    </emma:interpretation></emma:emma>

4.2.10.4 Composite Input and RelativeTimestamps

This section is informative.

The following table provides guidance on how to determine thevalues of relative timestamps on a composite input.

Informative Guidance on Relative Timestamps in CompositeDerivations
emma:time-ref-uriIf the reference interval URI is the same for both inputs thenit should be the same for the composite input. If it is not thesame then relative timestamps will have to be resolved to absolutetimestamps in order to determine the combined timestamp. .
emma:time-ref-anchor-pointIf the anchor value is the same for both inputs then it shouldbe the same for the composite input. If it is not the same thenrelative timestamps will have to be resolved to absolute timestampsin order to determine the combined timestamp.
emma:offset-to-startGiven that theemma:time-ref-uri andemma:time-ref-anchor-point are the same for bothcombining inputs, then theemma:offset-to-start forthe combination should be the lesser of the two. If they are notthe same then relative timestamps will have to be resolved toabsolute timestamps in order to determine the combinedtimestamp.
emma:durationGiven that theemma:time-ref-uri andemma:time-ref-anchor-point are the same for bothcombining inputs, then theemma:duration is calculatedas follows. Add together theemma:offset-to-start andemma:duration for each of the inputs. Take whicheverof these is greater and subtract from it the lesser of theemma:offset-to-start values in order to determine thecombined duration. Ifemma:time-ref-uri andemma:time-ref-anchor-point are not the same thenrelative timestamps will have to be resolved to absolute timestampsin order to determine the combined timestamp.

4.2.11 Medium, mode, and function of user inputs:emma:medium,emma:mode,emma:function,emma:verbalattributes

Annotationemma:medium
DefinitionAn attribute of typexsd:nmtokenswhich contains a space delimited set of values from theset {acoustic,tactile,visual}.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence,emma:endpoint, and application instance data
Annotationemma:mode
DefinitionAn attribute of typexsd:nmtokenswhich contains a space delimited set of values from anopen set of values including: {voice,dtmf,ink,gui,keys,video,photograph,...}.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence,emma:endpoint, and application instance data
Annotationemma:function
DefinitionAn attribute of typexsd:string constrained tovalues in the open set {recording,transcription,dialog,verification, ...}.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence, andapplication instance data
Annotationemma:verbal
DefinitionAn attribute of typexsd:boolean.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence, andapplication instance data
Annotationemma:device-type
DefinitionThe type of device, or list of types of device throughwhich the input is captured. An attribute of typexsd:nmtokens which contains a space delimited set ofvalues from an open set of values including:{microphone,touchscreen,mouse,keypad,keyboard,pen,joystick,touchpad,scanner,camera_2d,camera_3d,thumbwheel...}.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence, and application instancedata
Annotationemma:expressed-through
DefinitionThe modality, or list of modalities, through which theinterpretation is expressed. An attribute of typexsd:nmtokens which contains a space delimited set ofvalues from an open set of values including: {gaze,face,head,torso,hands,leg,locomotion,posture,physiology, ...}.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence, and application instancedata

EMMA provides two properties for the annotation of inputmodality. One indicating the broader medium or channel(emma:medium) and another indicating the specific modeof communication used on that channel (emma:mode). Theinput medium is defined from the users perspective and indicateswhether they use their voice (acoustic), touch(tactile), or visual appearance/motion(visual) as input. Tactile includes mosthand-on input device types such as pen, mouse, keyboard, andtouch screen. Visual is used for camera input.

emma:medium =space delimited sequence of values from the set:             [acoustic|tactile|visual]

The mode property provides the ability to distinguish betweendifferent modes of communication that may be within a particularmedium. For example, in the tactile medium, modes includeelectronic ink (ink), and pointing and clicking on agraphical user interface (gui).

emma:mode =space delimited sequence of values from the set:              [voice|dtmf|ink|gui|keys|video|photograph| ... ]

Theemma:medium classification is based on theboundary between the user and the device that they use. Foremma:medium="tactile" the user physically touches thedevice in order to provide input. Foremma:medium="visual" the user's movement is capturedby sensors (cameras, infrared) resulting in an input to the system.In the case whereemma:medium="acoustic" the userprovides input to the system by producing an acoustic signal. Notethen that DTMF input will be classified asemma:medium="tactile" since in order to provide DTMFinput the user physically presses keys on a keypad.

In order to clarify the difference betweenemma:medium andemma:mode consider thefollowing examples of different ways to capture drawn input. If theuser input consists of drawing it will be classified asemma:mode="ink". If the user physically draws on atouch sensitive screen then the input is classified asemma:medium ="tactile" since the user interacts withthe system by direct contact. If instead the user draws on atabletop and their input is captured by a camera mounted above (orbelow) the surface then the input isemma:medium="visual". Similarly, drawing on a large screen displayusing hand gestures made in space and sensed with a camera will beclassified asemma:mode="ink" andemma:medium="visual".

Whileemma:medium andemma:mode areoptional on specific elements such asemma:interpretation andemma:one-of, notethat all EMMA interpretations must be annotated foremma:medium andemma:mode, so eitherthese attributes must appear directly onemma:interpretation or they must appear on an ancestoremma:one-of node or they must appear on an earlierstage of the derivation listed inemma:derivation.

Theemma:device-type annotation can be used toindicate the specific type of device used to capture the input.This allow for differentiation of, multiple differenttactile inputs within theink mode, suchas touchscreen input, pen, and mouse.

emma:device-type = space delimited sequence of values from the set:                     [microphone|keypad|keyboard|touchscreen|touchpad|                    mouse|pen|joystick|thumbwheel|                    camera_2d|camera_3d|scanner... ]

Theemma:device-type attribute SHOULD be used toindicate the general category of the sensor used to captured theinput. The specific model number or characteristics SHOULD becaptured instead usingemma:process (Section4.2.2).

Orthogonal to the mode, user inputs can also be classified withrespect to their communicative function. This enables a simplermode classification.

emma:function = [recording|transcription|dialog|verification| ... ]

For example, speech can be used for recording (e.g. voicemail),transcription (e.g. dictation), dialog (e.g. interactive spokendialog systems), and verification (e.g. identifying users throughtheir voiceprints).

EMMA also supports an additional propertyemma:verbal which distinguishes verbal use of an inputmode from non-verbal. This MAY be used to distinguish the use ofelectronic ink to convey handwritten commands from the user ofelectronic ink for symbolic gestures such as circles and arrows.Handwritten commands, such as writingdowntown in order tochange a map display to show the downtown are classified as verbal(emma:function="dialog" emma:verbal="true"). Pengestures (arrows, lines, circles, etc), such as circling abuilding, are classified as non-verbal dialog(emma:function="dialog" emma:verbal="false"). The useof handwritten words to transcribe an email message is classifiedas transcription (emma:function="transcription"emma:verbal="true").

emma:verbal = [true|false]

Handwritten words and ink gestures are typically recognizedusing different kinds of recognition components (handwritingrecognizer vs. gesture recognizer) and the verbal annotation willbe added by the recognition component which classifies the input.The original input source, a pen in this case, will not be aware ofthis difference. The input source identifier will tell you that theinput was from a pen of some kind but will not tell you if the modeof input was handwriting (show downtown) or gesture (e.g.circling an object or area).

Here is an example of the EMMA annotation for a pen input wherethe user's ink is recognized as either a word ("Boston") or as anarrow:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:one-of>    <emma:interpretation     emma:confidence="0.6"     emma:medium="tactile"     emma:mode="ink"emma:device-type="pen"     emma:function="dialog"     emma:verbal="true">       <location>Boston</location>    </emma:interpretation>    <emma:interpretation     emma:confidence="0.4"     emma:medium="tactile"     emma:mode="ink"emma:device-type="pen"     emma:function="dialog"     emma:verbal="false">       <direction>45</direction>    </emma:interpretation>  </emma:one-of></emma:emma>

Here is an example of the EMMA annotation for a spoken commandwhich is recognized as either "Boston" or "Austin":

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:one-of>    <emma:interpretation     emma:confidence="0.6"     emma:medium="acoustic"     emma:mode="voice"emma:device-type="microphone"     emma:function="dialog"     emma:verbal="true">       <location>Boston</location>    </emma:interpretation>    <emma:interpretation     emma:confidence="0.4"     emma:medium="acoustic"     emma:mode="voice"emma:device-type="microphone"     emma:function="dialog"     emma:verbal="true">       <location>Austin</location>    </emma:interpretation>  </emma:one-of></emma:emma>

The following table shows the relationship between the medium,mode, and function properties and serves as an aid for classifyinginputs. For the dialog function it also shows some examples of theclassification of inputs as verbal vs. non-verbal.

MediumDevice-typeModeFunction
recordingdialogtranscriptionverification
acousticmicrophonevoiceaudiofile (e.g. voicemail)spoken command / query / response (verbal = true)dictationspeaker recognition
singing a note (verbal = false)
tactilekeypaddtmfaudiofile / character streamtyped command / query / response (verbal = true)text entry (T9-tegic, word completion, or wordgrammar)password / pin entry
command key "Press 9 for sales" (verbal = false)
keyboarddtmfcharacter / key-code streamtyped command / query / response (verbal = true)typingpassword / pin entry
command key "Press S for sales" (verbal = false)
peninktrace, sketchhandwritten command / query / response (verbal = true)handwritten text entrysignature, handwriting recognition
gesture (e.g. circling building) (verbal = false)
guiN/Atapping on named button (verbal = true)soft keyboardpassword / pin entry
drag and drop, tapping on map (verbal = false)
touchscreeninktrace, sketchhandwritten command / query / response (verbal =true)handwritten text entrysignature, handwritingrecognition
gesture (e.g. circling building) (verbal =false)
guiN/Atapping on named button (verbal = true)soft keyboardpassword / pin entry
drag and drop, tapping on map (verbal =false)
mouseinktrace, sketchhandwritten command / query / response (verbal = true)handwritten text entryN/A
gesture (e.g. circling building) (verbal = false)
guiN/Aclicking named button (verbal = true)soft keyboardpassword / pin entry
drag and drop, clicking on map (verbal = false)
joystickinktrace,sketchgesture (e.g. circling building) (verbal = false)N/AN/A
guiN/Apointing, clicking button / menu (verbal = false)soft keyboardpassword / pin entry
visualscannerphotographimagehandwritten command / query / response (verbal = true)optical character recognition, object/scenerecognition (markup, e.g. SVG)N/A
drawings and images (verbal = false)
camera_2dphotographimageobjects (verbal = false)visual object/scene recognitionface id, retinal scan
camera_2dvideomoviesign language (verbal = true)audio/visual recognitionface id, gait id, retinal scan
face / hand / arm / body gesture (e.g. pointing, facing)(verbal = false)

Theemma:expressed-through attribute describes themodality through which an input is produced, usually by a humanbeing. This differs from the specific mode of communication(emma:mode) and the broader channel or medium(emma:medium). For example in the case where a userprovides ink input on a touchscreen using their hands the inputwould be classified asemma:medium="tactile",emma:mode="ink", andemma:expressed-through="hands". Theemma:expressed-through attribute is not specific aboutthe sensors used for observing the modality. These can be specifiedusingemma:medium andemma:modeattributes.

This mechanism allows for more fine grained annotation of thespecific body part that is analyzed in the assignment of an EMMAresult. For example, in an emotion recognition task using computervision techniques on video camera input,emma:medium="visual" andemma:mode="video". If the face is being analyzed todetermine the result thenemma:expressed-through="face" while if the body motionis being analyzed thenemma:expressed-through="locomotion".

The list of values provided covers a broad range of modalitiesthrough which inputs may be expressed. These values SHOULD be usedif they are appropriate. The list is an open set in order to allowfor more fine-grained distinctions such as "eyes" vs. "mouth"etc.

4.2.12 Composite multimodality:emma:hookattribute

Annotationemma:hook
DefinitionAn attribute of typexsd:string constrained tovalues in the open set {voice,dtmf,ink,gui,keys,video,photograph, ...} or the wildcardany
Applies toApplication instance data

The attributeemma:hook MAY be used to mark theelements in the application semantics within anemma:interpretation which are expected to beintegrated with content from input in another mode to yield acomplete interpretation. Theemma:mode to beintegrated at that point in the application semantics is indicatedas the value of theemma:hook attribute. The possiblevalues ofemma:hook are the list of input modes thatcan be values ofemma:mode(seeSection4.2.11). In addition to these, the value ofemma:hook can also be the wildcardanyindicating that the other content can come from any source. Theannotationemma:hook differs in semantics fromemma:mode as follows. Annotating an element in theapplication semantics withemma:mode="ink" indicatesthat that part of the semantics came from theinkmode. Annotating an element in the application semantics withemma:hook="ink" indicates that part of the semanticsneeds to be integrated with content from theinkmode.

To illustrate the use ofemma:hook consider anexample composite input in which the user says "zoom in here" inthe speech input mode while drawing an area on a graphical displayin the ink input mode.The fact that thelocation element needs to come from theink mode is indicated by annotating this applicationnamespace element usingemma:hook

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretationemma:medium="acoustic" emma:mode="voice">    <command>      <action>zoom</action>      <location emma:hook="ink">        <type>area</type>      </location>    </command>  </emma:interpretation></emma:emma>

For more detailed explanation of this example seeAppendixC.

4.2.13 Cost:emma:cost attribute

Annotationemma:cost
DefinitionAn attribute of typexsd:decimal in range 0.0 to10000000, indicating the processor's cost or weight associated withan input or part of an input.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence,emma:arc,emma:node, and applicationinstance data.

The cost annotation in EMMA indicates the weight or costassociated with an user's input or part of their input. The mostcommon use ofemma:cost is for representing the costsencoded on a lattice output from speech recognition or otherrecognition or understanding processes.emma:cost MAYalso be used to indicate the total cost associated with particularrecognition results or semantic interpretations.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:one-ofemma:medium="acoustic" emma:mode="voice">    <emma:interpretation emma:cost="1600">      <location>Boston</location>    </emma:interpretation>    <emma:interpretation emma:cost="400">      <location> Austin </location>    </emma:interpretation>  </emma:one-of></emma:emma>

4.2.14 Endpoint properties:emma:endpoint-role,emma:endpoint-address,emma:port-type,emma:port-num,emma:message-id,emma:service-name,emma:endpoint-pair-ref,emma:endpoint-info-ref attributes

Annotationemma:endpoint-role
DefinitionAn attribute of typexsd:string constrained tovalues in the set {source,sink,reply-to,router}.
Applies toemma:endpoint
Annotationemma:endpoint-address
DefinitionAn attribute of typexsd:anyURI that uniquelyspecifies the network address of theemma:endpoint.
Applies toemma:endpoint
Annotationemma:port-type
DefinitionAn attribute of typexsd:QName that specifies thetype of the port.
Applies toemma:endpoint
Annotationemma:port-num
DefinitionAn attribute of typexsd:nonNegativeInteger thatspecifies the port number.
Applies toemma:endpoint
Annotationemma:message-id
DefinitionAn attribute of typexsd:anyURI that specifies themessage ID associated with the data.
Applies toemma:endpoint
Annotationemma:service-name
DefinitionAn attribute of typexsd:string that specifies thename of the service.
Applies toemma:endpoint
Annotationemma:endpoint-pair-ref
DefinitionAn attribute of typexsd:anyURI that specifies thepairing between sink and source endpoints.
Applies toemma:endpoint
Annotationemma:endpoint-info-ref
DefinitionAn attribute of typexsd:IDREF referring to theid attribute of anemma:endpoint-infoelement.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence, andapplication instance data.

Theemma:endpoint-role attribute specifies the rolethat the particularemma:endpoint performs inmultimodal interaction. The role valuesink indicatesthat the particular endpoint is the receiver of the input data. Therole valuesource indicates that the particularendpoint is the sender of the input data. The role valuereply-to indicates that the particularemma:endpoint is the intended endpoint for the reply.The sameemma:endpoint-address MAY appear in multipleemma:endpoint elements, provided that the sameendpoint address is used to serve multiple roles, e.g. sink,source, reply-to, router, etc., or associated with multipleinterpretations.

Theemma:endpoint-address specifies the networkaddress of theemma:endpoint, andemma:port-type specifies the port type of theemma:endpoint. Theemma:port-numannotates the port number of the endpoint (e.g. the typical portnumber for an http endpoint is 80). Theemma:message-id annotates the message ID informationassociated with the annotated input. This meta information is usedto establish and maintain the communication context for bothinbound processing and outbound operation. The servicespecification of theemma:endpoint is annotated byemma:service-name which contains the definition of theservice that theemma:endpoint performs. The matchingof thesink endpoint and its pairingsource endpoint is annotated by theemma:endpoint-pair-ref attribute. One sink endpointMAY link to multiple source endpoints throughemma:endpoint-pair-ref. Further bounding of theemma:endpoint is possible by using the annotation ofemma:group (seeSection3.3.2).

Theemma:endpoint-info-ref attribute associates theEMMA result in the container element with anemma:endpoint-info element.

The following example illustrates the use of these attributes inmultimodal interactions where multiple modalities are used.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example"    xmlns:ex="http://www.example.com/emma/port">  <emma:endpoint-info >    <emma:endpoint        emma:endpoint-role="sink"        emma:endpoint-address="135.61.71.103"        emma:port-num="50204"        emma:port-type="rtp"        emma:endpoint-pair-ref="endpoint2"        emma:media-type="audio/dsr-202212; rate:8000; maxptime:40"        emma:service-name="travel"        emma:mode="voice">      <ex:app-protocol>SIP</ex:app-protocol>    </emma:endpoint>    <emma:endpoint emma:endpoint-role="source"        emma:endpoint-address="136.62.72.104"        emma:port-num="50204"        emma:port-type="rtp"        emma:endpoint-pair-ref="endpoint1"        emma:media-type="audio/dsr-202212; rate:8000; maxptime:40"        emma:service-name="travel"        emma:mode="voice">      <ex:app-protocol>SIP</ex:app-protocol>    </emma:endpoint>  </emma:endpoint-info>  <emma:endpoint-info>     <emma:endpoint emma:endpoint-role="sink"         emma:endpoint-address="http://emma.example/sink"         emma:endpoint-pair-ref="endpoint4"         emma:port-num="80" emma:port-type="http"         emma:message-id="uuid:2e5678"         emma:service-name="travel"         emma:mode="ink"/>     <emma:endpoint         emma:endpoint-role="source"         emma:port-address="http://emma.example/source"         emma:endpoint-pair-ref="endpoint3"         emma:port-num="80"         emma:port-type="http"         emma:message-id="uuid:2e5678"         emma:service-name="travel"         emma:mode="ink"/>  </emma:endpoint-info>  <emma:group>    <emma:interpretation emma:start="1087995961542"        emma:end="1087995963542"        emma:endpoint-info-ref="audio-channel-1"        emma:medium="acoustic" emma:mode="voice">      <destination>Chicago</destination>    </emma:interpretation>    <emma:interpretation emma:start="1087995961542"        emma:end="1087995963542"        emma:endpoint-info-ref="ink-channel-1"        emma:medium="acoustic" emma:mode="voice">      <location>         <type>area</type>         <points>34.13 -37.12 42.13 -37.12 ... </points>      </location>    </emma:interpretation>  </emma:group></emma:emma>

4.2.15 Reference toemma:grammarelement:emma:grammar-ref attribute

Annotationemma:grammar-ref
DefinitionAn attribute of typexsd:IDREF referring to theid attribute of anemma:grammarelement.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence,andemma:active.

Theemma:grammar-ref attribute associates the EMMAresult in the container element with anemma:grammarelement.Theemma:grammar-ref attribute is alsoused onemma:active elements withinemma:grammar-active in order to indicate whichgrammars are active during the processing of an input (4.1.4).

The following example shows the use ofemma:grammar-ref on the container elementemma:interpretation and on theemma:active element:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:grammargrammar-type="application/srgs-xmlref="someURI"/>  <emma:grammargrammar-type="application/srgs-xmlref="anotherURI"/>  <emma:one-ofemma:medium="acoustic" emma:mode="voice"><emma:grammar-active>        <emma:active emma:grammar-ref="gram1"/>        <emma:active emma:grammar-ref="gram2"/>    </emma:grammar-active>    <emma:interpretation emma:grammar-ref="gram1">      <origin>Boston</origin>    </emma:interpretation>    <emma:interpretation emma:grammar-ref="gram1">      <origin>Austin</origin>    </emma:interpretation>    <emma:interpretation emma:grammar-ref="gram2">      <command>help</command>    </emma:interpretation>  </emma:one-of></emma:emma>

4.2.16 Reference toemma:modelelement:emma:model-ref attribute

Annotationemma:model-ref
DefinitionAn attribute of typexsd:IDREF referring to theid attribute of anemma:modelelement.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence, andapplication instance data.

Theemma:model-ref annotation associates the EMMAresult in the container element with anemma:modelelement.

Example:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:model ref="someURI"/>  <emma:model ref="anotherURI"/>  <emma:one-ofemma:medium="acoustic" emma:mode="voice">    <emma:interpretation emma:model-ref="model1">      <origin>Boston</origin>    </emma:interpretation>    <emma:interpretation emma:model-ref="model1">      <origin>Austin</origin>    </emma:interpretation>    <emma:interpretation emma:model-ref="model2">      <command>help</command>    </emma:interpretation>  </emma:one-of></emma:emma>

4.2.17 Dialog turns:emma:dialog-turnattribute

Annotationemma:dialog-turn
DefinitionAn attribute of typexsd:string referring to thedialog turn associated with a given container element.
Applies toemma:interpretation,emma:group,emma:one-of, andemma:sequence.

Theemma:dialog-turn annotation associates the EMMAresult in the container element with a dialog turn. The syntax andsemantics of dialog turns is left open to suit the needs ofindividual applications. For example, some applications might usean integer value, where successive turns are represented bysuccessive integers. Other applications might combine a name of adialog participant with an integer value representing the turnnumber for that participant. Ordering semantics for comparison ofemma:dialog-turn is deliberately unspecified and leftfor applications to define.

Example:

<emma:emma version="1.1"    emma="http://www.w3.org/2003/04/emma"    xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation emma:dialog-turn="u8"emma:medium="acoustic" emma:mode="voice">    <quantity>3</quantity>  </emma:interpretation></emma:emma>

4.2.18 Semantic representation type:emma:result-format attribute

Annotationemma:result-format
DefinitionAn attribute of typexsd:string containing a MIMEtype which indicates the representation used in the applicationsemantics that appears within the containedemma:interpretation.
Applies toemma:interpretation,emma:literal,emma:group,emma:one-of, andemma:sequence.

Typically, the application semantics contained within EMMA is inXML format, as can be seen in examples throughout thespecification. The application semantics can be also be a simplestring, contained withinemma:literal. EMMA alsoaccommodates other semantic representation formats such as JSON(JavaScript Object Notation [JSON])using CDATA withinemma:literal. The function of theemma:result-format attribute is to make explicit thespecific format of the semantic representation. The value is a MIMEtype. The value to generally be used for XML semanticrepresentations istext/xml. Ifemma:result-format is not specified, the assumeddefault istext/xml. If a more specific XML MIME typeis being used then this should be indicated explicitly inemma:result-format, e.g. for RDF theemma:result-format would beapplication/rdf+xml. In the following example, theapplication semantic representation is JSON and the MIME typeapplication/json appears inemma:result-format indicating to an EMMA processorwhat to expect within the containedemma:literal.

<emma:emma
version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns="http://www.example.com/example">
<emma:interpretation id=“int1"
emma:confidence="0.75”
emma:medium="acoustic"
emma:mode="voice"
emma:verbal="true"
emma:function="dialog"
emma:result-format="application/json"
<emma:literal>
<![CDATA[
{
drink: {
liquid:"coke",
drinksize:"medium"},
pizza: {
number: "3",
pizzasize: "large",
topping: [ "pepperoni", "mushrooms" ]
}
}
]]>
</emma:literal>
</emma:interpretation>
</emma:emma>

Note that while many of the examples of semantic representationin the specification are simple lists of attributes and values,EMMA interpretations can contain arbitrarily complex semanticrepresentations. XML representation can be used for the payload, sorepresentations can be nested, have attributes, and ID referencescan be used to capture aspects of the interpretation such asvariable binding or co-reference. Also usingemma:result-format andemma:literal asabove, other kinds of logical representations and notations, notnecessarily XML, can also be carried as EMMA payloads.

4.2.19 Reference toemma:infoelement:emma:info-ref attribute

Annotationemma:info-ref
DefinitionAn attribute of typexsd:IDREF referring totheid attribute of anemma:infoelement.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence, and application instancedata.

Theemma:info-ref annotation associates the EMMAresult in the container element with a particularemma:info element. This allows a singleemma:info block of application and vendor specificannotations to apply to multiple different members of anemma:one-of oremma:group oremma:sequence. Alternatively,emma:infocould appear separately as a child of eachemma:interpretation. The benefit of usingemma:info-ref is it avoids the need to repeat the sameblock ofemma:info for multiple differentinterpretations.

Example:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:info>    <customer_type>residential</customer_type>    <service_name>acme_travel_service</service_name>  </emma:info>   <emma:info>    <customer_type>residential</customer_type>    <service_name>acme_pizza_service</service_name>  </emma:info>  <emma:one-of emma:start="1087995961542"      emma:end="1087995963542"      emma:medium="acoustic" emma:mode="voice">    <emma:interpretation emma:confidence="0.75"      emma:tokens="flights from boston to denver tomorrow"      emma:info-ref="info1">      <origin>Boston</origin>      <destination>Denver</destination>    </emma:interpretation>    <emma:interpretation emma:confidence="0.68"       emma:tokens="pizza with pepperoni and onions"       emma:info-ref="info2">      <order>pizza</order>      <topping>pepperoni</topping>      <topping>onion</topping>    </emma:interpretation>    <emma:interpretation emma:confidence="0.38"       emma:tokens="pizza with peppers and cheese"       emma:info-ref="info2">      <order>pizza</order>      <topping>pepperoni</topping>      <topping>cheese</topping>    </emma:interpretation>  </emma:one-of>

4.2.20 Reference toemma:process-model element:emma:process-model-ref attribute

Annotationemma:process-model-ref
DefinitionAn attribute of typexsd:IDREF referring to theid attribute of anemma:process-modelelement.
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence, andapplication instance data.

Theemma:process-model-ref annotation associatesthe EMMA result in the container element with anemma:process-model element. In the following examplethe specific model used to produce two different object recognitionresults based on an image as input are indicated on theinterpretations usingemma:process-model-ref whichreferences anemma:process-model element underemma:emma whoseref attribute containsURI identifying the particular model used.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma    http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:process-model    type="neural_network"    ref="http://example.com/vision/vehicle"/>  <emma:process-model    type="neural_network"    ref="http://example.com/vision/people"/>  <emma:one-of    emma:start="1087995961542"    emma:end="1087995961542"    emma:medium="visual"     emma:mode="image"    emma:process="http://example.com/mycompvision1.xml">>    <emma:interpretation      emma:confidence="0.9"      emma:process-model-ref="pm1">      <object>aircraft</object>    </emma:interpretation>    <emma:interpretation       emma:confidence="0.1"      emma:process-model-ref="pm2">      <object>person</object>    </emma:interpretation>  </emma:one-of></emma:emma>

4.2.21 Reference toemma:parameterselement:emma:parameter-ref attribute

Annotationemma:parameter-ref
DefinitionAn attribute of typexsd:IDREF referring to theid attribute of anemma:parameterselement.
Applies toemma:interpretation,emma:group,emma:one-of, andemma:sequence.

Theemma:parameter-ref annotation associates theEMMA result(s) in the container element it appears on with anemma:parameters element that specifies a series ofparameters used to configure the processor that produced thoseresult(s). This allows a set of parameters to be specified once inan EMMA document and referred to by multiple differentinterpretations. Different configurations of parameters can beassociated with different interpretations. In the example, belowthere are twoemma:parameters elements and in theN-best list of alternative interpretations withinemma:one-of eachemma:interpretationreferences the relevant set of parameters usingemma:parameter-ref.

<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:parameters api-ref="voicexml2.1">
<emma:parameter name="speedvsaccuracy" value=".5"/>
<emma:parameter name="sensitivity" value=".6"/>
</emma:parameters> <emma:parameters api-ref="voicexml2.1">
<emma:parameter name="speedvsaccuracy" value=".7"/>
<emma:parameter name="sensitivity" value=".3"/>
</emma:parameters> <emma:one-of emma:medium="acoustic" emma:mode="voice" emma:process="http://example.com/myasr1.xml">
<emma:interpretation emma:parameter-ref="parameters1">
<origin>Boston</origin>
</emma:interpretation> <emma:interpretation emma:parameter-ref="parameters2">
<origin>Austin</origin>
</emma:interpretation> </emma:one-of>
</emma:emma>

4.2.22 Human transcription: theemma:annotated-tokens attribute

Annotationemma:annotated-tokens
DefinitionAn attribute of typexsd:string holding thereference sequence of tokens determined by a human annotator
Applies toemma:interpretation,emma:group,emma:one-of,emma:sequence,emma:arc, and application instance data.

Theemma:annotated-tokens attribute holds a list ofinput tokens. In the following description, the termtokensis used in the computational and syntactic sense ofunits ofinput, and not in the sense ofXML tokens. The valueheld inemma:annotated-tokens is the list of thetokens of input as determined by a human annotator. For example, incase of speech recognition this will contain the reference string.Theemma:annotated-tokens annotation MAY be appliednot just to the lexical words and phrases of language but to anylevel of input processing. Other examples of tokenization includephonemes, ink strokes, gestures and any other discrete units ofinput at any level.

In the following example, a speech recognizer has processed anaudio input signal and the hypothesized string is "from cambridgeto london tomorrow" contained inemma:tokens. A humanlabeller has listened to the audio and added the reference string"from canterbury to london today" in theemma:annotated-tokens attribute.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation      emma:medium="acoustic"       emma:mode="voice"      emma:function="dialog"      emma:verbal="true"      emma:signal="http://example.com/audio/input678.amr"      emma:process="http://example.com/asr/params.xml"      emma:tokens="from cambridge to london tomorrow"
emma:annotated-tokens="from canterbury to london today"> <origin>Cambridge</origin> <destination>London</destination> <date>tomorrow</date> </emma:interpretation></emma:emma>

In order to provide metadata on the annotation such as the nameof the annotator or time of annotation, the more powerfulemma:annotationelement mechanism should be used. This also allows for structuredannotations such as labelling of a semantic interpretation inXML.

4.2.23 Partial Content:emma:partial-content

Annotationemma:partial-content
DefinitionAn attribute of typexsd:Boolean indicatingwhether the content of an element is partial and the full elementcan be retrieved by retrieving the URI indicated in theref attribute on the same element
Applies toemma:one-of,emma:group,emma:sequence,emma:lattice,emma:info,emma:annotation,emma:parameters and application instance data.

Theemma:partial-content attribute isrequired on the element it applies to when the content containedwithin the element is a subset of the content contained within theelement referred to through theref attribute on thesame element. If the local element is empty, but a full documentcan be retrieved from the server, then in that caseemma:partial-content must betrue. If theelement is empty and the element on the server is also empty thenemma:partial-content must befalse. Thedefault value inemma:partial-content is not specifiedisfalse.

4.3 Scope of EMMA annotations

Theemma:derived-from element (Section4.1.2) can be used to capture both sequential and compositederivations. This section concerns the scope of EMMA annotationsacrosssequential derivations of user input connectedusing theemma:derived-from element (Section4.1.2). Sequential derivations involve processing steps that donot involve multimodal integration, such as applying naturallanguage understanding and then reference resolution to a speechtranscription. EMMA derivations describe only single turns of userinput and are not intended to describe a sequence of dialogturns.

For example, an EMMA document could containemma:interpretation elements for the transcription,interpretation, and reference resolution of a speech input,utilizing theid values:raw,better, andbest respectively:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example"> <emma:derivation>  <emma:interpretation      emma:process="http://example.com/myasr1.xml"emma:medium="acoustic" emma:mode="voice">    <answer>From Boston to Denver tomorrow</answer>  </emma:interpretation>  <emma:interpretation      emma:process="http://example.com/mynlu1.xml">    <emma:derived-from resource="#raw" composite="false"/>    <origin>Boston</origin>    <destination>Denver</destination>    <date>tomorrow</date>  </emma:interpretation> </emma:derivation>  <emma:interpretation      emma:process="http://example.com/myrefresolution1.xml">    <emma:derived-from resource="#better" composite="false"/>    <origin>Boston</origin>    <destination>Denver</destination>    <date>03152003</date>  </emma:interpretation></emma:emma>

Each member of the derivation chain is linked to the previousone by aderived-from element (Section4.1.2), which has an attributeresource thatprovides a pointer to theemma:interpretation fromwhich it is derived. Theemma:process annotation (Section4.2.2) provides a pointer to the process used for each stage ofthe derivation.

The following EMMA example represents the same derivation asabove but with a more fully specified set of annotations:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:derivation>    <emma:interpretation        emma:process="http://example.com/myasr1.xml"        emma:source="http://example.com/microphone/NC-61"        emma:signal="http://example.com/signals/sg23.wav"        emma:confidence="0.6"        emma:medium="acoustic"        emma:mode="voice"        emma:function="dialog"        emma:verbal="true"        emma:tokens="from boston to denver tomorrow"        emma:lang="en-US">      <answer>From Boston to Denver tomorrow</answer>    </emma:interpretation>    <emma:interpretation        emma:process="http://example.com/mynlu1.xml"        emma:source="http://example.com/microphone/NC-61"        emma:signal="http://example.com/signals/sg23.wav"        emma:confidence="0.8"        emma:medium="acoustic"        emma:mode="voice"        emma:function="dialog"        emma:verbal="true"        emma:tokens="from boston to denver tomorrow"        emma:lang="en-US">      <emma:derived-from resource="#raw" composite="false"/>      <origin>Boston</origin>      <destination>Denver</destination>      <date>tomorrow</date>    </emma:interpretation>  </emma:derivation>  <emma:interpretation      emma:process="http://example.com/myrefresolution1.xml"      emma:source="http://example.com/microphone/NC-61"      emma:signal="http://example.com/signals/sg23.wav"      emma:confidence="0.8"      emma:medium="acoustic"      emma:mode="voice"      emma:function="dialog"      emma:verbal="true"      emma:tokens="from boston to denver tomorrow"      emma:lang="en-US">    <emma:derived-from resource="#better" composite="false"/>    <origin>Boston</origin>    <destination>Denver</destination>    <date>03152003</date>  </emma:interpretation></emma:emma>

EMMA annotations on earlier stages of the derivation oftenremain accurate at later stages of the derivation. Although thiscan be captured in EMMA by repeating the annotations on eachemma:interpretation within the derivation, as in theexample above, there are two disadvantages of this approach toannotation. First, the repetition of annotations makes theresulting EMMA documents significantly more verbose. Second, EMMAprocessors used for intermediate tasks such as natural languageunderstanding and reference resolution will need to read in all ofthe annotations and write them all out again.

EMMA overcomes these problems by assuming that annotations onearlier stages of a derivation automatically apply to later stagesof the derivation unless a new value is specified. Later stages ofthe derivation essentially inherit annotations from earlier stagesin the derivation. For example, if there was anemma:source annotation on the transcription(raw) it would also apply to the later stages of thederivation such as the result of natural language understanding(better) or reference resolution(best).

Because of the assumption in EMMA that annotations have scopeover later stages of a sequential derivation, the example EMMAdocument above can be equivalently represented as follows:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:derivation>    <emma:interpretation        emma:process="http://example.com/myasr1.xml"        emma:source="http://example.com/microphone/NC-61"        emma:signal="http://example.com/signals/sg23.wav"        emma:confidence="0.6"        emma:medium="acoustic"        emma:mode="voice"        emma:function="dialog"        emma:verbal="true"        emma:tokens="from boston to denver tomorrow"        emma:lang="en-US">      <answer>From Boston to Denver tomorrow</answer>    </emma:interpretation>    <emma:interpretation        emma:process="http://example.com/mynlu1.xml"        emma:confidence="0.8">      <emma:derived-from resource="#raw" composite="false"/>      <origin>Boston</origin>      <destination>Denver</destination>      <date>tomorrow</date>    </emma:interpretation>  </emma:derivation>  <emma:interpretation      emma:process="http://example.com/myrefresolution1.xml">    <emma:derived-from resource="#better" composite="false"/>    <origin>Boston</origin>    <destination>Denver</destination>    <date>03152003</date>  </emma:interpretation></emma:emma>

The fully specified derivation illustrated above is equivalentto the reduced form derivation following it where only annotationswith new values are specified at each stage. These two EMMAdocuments MUST yield the same result when processed by an EMMAprocessor.

Theemma:confidence annotation is respecified onthebetter interpretation. This indicates theconfidence score for natural language understanding, whereasemma:confidence on theraw interpretationindicates the speech recognition confidence score.

In order to determine the full set of annotations that apply toanemma:interpretation element an EMMA processor orscript needs to access the annotations directly on that element andfor any that are not specified follow the reference in theresource attribute of theemma:derived-from element to add in annotations fromearlier stages of the derivation.

The EMMA annotations break down into three groups with respectto their scope in sequential derivations. One group of annotationsalways holds true for all members of a sequentialderivation. A second groupis always respecified oneach stage of the derivation. A third group may or may not berespecified.

Scope of Annotations in Sequential Derivations
ClassificationAnnotation
Applies to whole derivationemma:signal
emma:signal-size
emma:dialog-turn
emma:source
emma:medium
emma:mode
emma:function
emma:verbal
emma:lang
emma:tokens
emma:start
emma:end
emma:time-ref-uri
emma:time-ref-anchor-point
emma:offset-to-start
emma:duration
Specified at each stage of derivationemma:derived-from
emma:process
May be respecifiedemma:confidence
emma:cost
emma:grammar-ref
emma:model-ref
emma:no-input
emma:uninterpreted

One potential problem with this annotation scoping mechanism isthat earlier annotations could be lost if earlier stages of aderivation were dropped in order to reduce message size. Thisproblem can be overcome by considering annotation scope at thepoint where earlier derivation stages are discarded and populatingthe final interpretation in the derivation with all of theannotations which it could inherit. For example, if theraw andbetter stages were dropped theresulting EMMA document would be:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation      emma:start="1087995961542"      emma:end="1087995963542"      emma:process="http://example.com/myrefresolution1.xml"      emma:source="http://example.com/microphone/NC-61"      emma:signal="http://example.com/signals/sg23.wav"      emma:confidence="0.8"      emma:medium="acoustic"      emma:mode="voice"      emma:function="dialog"      emma:verbal="true"      emma:tokens="from boston to denver tomorrow"      emma:lang="en-US">    <emma:derived-from resource="#better" composite="false"/>    <origin>Boston</origin>    <destination>Denver</destination>    <date>03152003</date>  </emma:interpretation></emma:emma>

Annotations on anemma:one-of element are assumedto apply to all of the container elements within theemma:one-of.

Ifemma:one-of appears with anotheremma:one-of then annotations on the parentemma:one-of are assumed to apply to the children ofthe childemma:one-of.

Annotations onemma:group oremma:sequence do not apply to their childelements.

5. Conformance

The contents of this section are normative.

5.1 Conforming EMMA Documents

A document is a Conforming EMMA Document if it meets both thefollowing conditions:

The EMMA specification and these conformance criteria provide nodesignated size limits on any aspect of EMMA documents. There areno maximum values on the number of elements, the amount ofcharacter data, or the number of characters in attributevalues.

Within this specification, the term URI refers to aUniversal Resource Identifier as defined in [RFC3986]and extended in [RFC3987]with the new name IRI. The term URI has been retained in preferenceto IRI to avoid introducing new names for concepts such as "BaseURI" that are defined or referenced across the whole family of XMLspecifications.

5.2 Using EMMA with other Namespaces

The EMMA namespace is intended to be used with other XMLnamespaces as per the Namespaces in XML Recommendation [XMLNS].Future work by W3C is expected to address ways to specifyconformance for documents involving multiple namespaces.

5.3 Conforming EMMA Processors

A EMMA processor is a program that can process and/or generateConforming EMMA documents.

In a Conforming EMMA Processor, the XML parser MUST be able toparse and process all XML constructs defined by XML 1.1 [XML] andNamespaces in XML [XMLNS].It is not required that a Conforming EMMA Processor uses avalidating XML parser.

A Conforming EMMA Processor MUST correctly understand and applythe semantics of each markup element or attribute as described bythis document.

There is, however, no conformance requirement with respect toperformance characteristics of the EMMA Processor. For instance, nostatement is required regarding the accuracy, speed or othercharacteristics of output produced by the processor. No statementis made regarding the size of input that a EMMA Processor isrequired to support.

Appendices

Appendix A. XML andRELAX NGschemata

This section is Normative.

This section defines the formal syntax for EMMA documents interms of a normative XML Schema.

The schema provided here is for the EMMA 1.0 Recommendation. Noschema exists as of yet for the EMMA 1.1 Working Draft as it is awork in progress.

There are both an XML Schema andRELAX NG Schemafor the EMMA markup. The latest version of the XML Schema for EMMAis available athttp://www.w3.org/TR/emma/emma.xsdand the RELAX NG Schema can be found athttp://www.w3.org/TR/emma/emma.rng.

For stability it is RECOMMENDED that you use the dated URIavailable athttp://www.w3.org/TR/2009/REC-emma-20090210/emma.xsdandhttp://www.w3.org/TR/2009/REC-emma-20090210/emma.rng.

Appendix B. MIME type

This section is Normative.

The media type associated with the EMMA: Extensible MultiModal Annotation markup languagespecification is "application/emma+xml" and the filename suffix is".emma" as defined inAppendix B.1 of the EMMA: Extensible Multimodal Annotation specification.

Appendix C.emma:hook and SRGS

This section isInformative.

One of the most powerful aspects of multimodal interfaces istheir ability to provide support for user inputs which aredistributed over the available input modes. Thesecompositeinputs are contributions made by the user within a single turnwhich have component parts in different modes. For example, theuser might say "zoom in here" in the speech mode while drawing anarea on a graphical display in the ink mode. One of the centralmotivating factors for this kind of input is that different kindsof communicative content are best suited to different input modes.In the example of a user drawing an area on a map and saying "zoomin here", the zoom command is easiest to provide in speech but thespatial information, the specific area, is easier to provide inink.

Enabling composite multimodality is critical in ensuring thatmultimodal systems support more natural and effective interactionfor users. In order to support composite inputs, a multimodalarchitecture must provide some kind of multimodal integrationmechanism. In the W3C Multimodal Interaction Framework[MMIFramework], multimodal integration can be handled by anintegration component which follows the application of speechunderstanding and other kinds of interpretation procedures forindividual modes.

Given the broad range of different techniques being employed formultimodal integration and the extent to which this is an ongoingresearch problem, standardization of the specific method oralgorithm used for multimodal integration is not appropriate atthis time. In order to facilitate the development andinter-operation of different multimodal integration mechanisms EMMAprovides markup language enabling application independentspecification of elements in the application markup where contentfrom another mode needs to be integrated. These representation'hooks' can then be used by different kinds of multimodalintegration components and algorithms to drive the process ofmultimodal integration. In the processing of a composite multimodalinput, the result of applying a mode-specific interpretationcomponent to each of the individual modes will be EMMA markupdescribing the possible interpretation of that input.

One way to build an EMMA representation of a spoken input suchas "zoom in here" is to use grammar rules in the W3C SpeechRecognition Grammar Specification [SRGS]using the Semantic Interpretation[SISR]tags to build the application semantics with theemma:hook attribute. In this approach[ECMAScript]is specified in order to build up an object representing thesemantics. The resulting ECMAScript object is then translated toXML.

For our example case of "zoom in here". The following SRGS rulecould be used. TheSemantic Interpretation for SpeechRecognition specification[SISR]provides a reserved property_nsprefix for indicating thenamespace to be used with an attribute.

<rule>  zoom in here  <tag>    $.command = new Object();    $.command.action = "zoom";    $.command.location = new Object();    $.command.location._attributes = new Object();    $.command.location._attributes.hook = new Object();    $.command.location._attributes.hook._nsprefix = "emma";    $.command.location._attributes.hook._value = "ink";    $.command.location.type = "area";  </tag></rule>

Application of this rule will result in the following ECMAScriptobject being built.

command: {      action: "zoom"      location: {        _attributes: {           hook: {             _nsprefix: "emma"             _value: "ink"             }           }        type: "area"      }}

SIprocessing in an XML environment would generate the followingdocument:

<command>  <action>zoom</action>  <location emma:hook="ink">     <type>area</type>  </location></command>

This XML fragment might then appear within an EMMA document asfollows:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation      emma:medium="acoustic"      emma:mode="voice">    <command>      <action>zoom</action>      <location emma:hook="ink">         <type>area</type>      </location>    </command>  </emma:interpretation></emma:emma>

Theemma:hook annotation indicates that this speechinput needs to be combined with ink input such as thefollowing:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation      emma:medium="tactile"      emma:mode="ink">    <location>      <type>area</type>      <points>42.1345 -37.128 42.1346 -37.120 ... </points>    </location>  </emma:interpretation></emma:emma>

This representation could be generated by a pen modalitycomponent performing gesture recognition and interpretation. Theinput to the component would be anInk Markup Languagespecification[INKML]of the ink trace and the output would be the EMMA documentabove.

The combination will result in the following EMMA document forthe combined speech and pen multimodal input.

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretation      emma:medium="acoustic tactile"       emma:mode="voice ink"      emma:process="http://example.com/myintegrator.xml">    <emma:derived-from resource="http://example.com/voice1.emma/#voice1" composite="true"/>    <emma:derived-from resource="http://example.com/pen1.emma/#pen1" composite="true"/>    <command>       <action>zoom</action>       <location>         <type>area</type>         <points>42.1345 -37.128 42.1346 -37.120 ... </points>        </location>     </command>  </emma:interpretation></emma:emma>

There are two components to the process of integrating these twopieces of semantic markup. The first is to ensure that the two arecompatible; that is, that no semantic constraints are violated. Thesecond is to fuse the content from the two sources. In our example,the<type>area</type> element is intendedto indicate that this speech command requires integration with anarea gesture rather than, for example, a line gesture, which wouldhave the subelement<type>line</type>.This constraint needs to be enforced by whatever mechanism isresponsible for multimodal integration.

Many different techniques could be used for achieving thisintegration of the semantic interpretation of the pen input, a<location> element, with the corresponding<location> element in the speech. Theemma:hook simply serves to indicate theexistence of this relationship.

One way to achieve both the compatibility checking and fusion ofcontent from the two modes is to use a well-defined general purposematching mechanism such as unification.Graph unification[Graphunification] is a mathematical operation definedover directed acylic graphs which captures both of the componentsof integration in a single operation: the applications of thesemantic constraints and the fusing of content. One possiblesemantics for theemma:hook markup indicates thatcontent from the required mode needs to be unified with thatposition in the application semantics. In order to unify, twoelements must not have any conflicting values for subelements orattributes. This procedure can be defined recursively so thatelements within the subelements must also not clash and so on. Theresult of unification is the union of all of the elements andattributes of the two elements that are being unified.

In addition to the unification operation, in the resultingemma:interpretation theemma:hookattribute needs to be removed and theemma:modeattribute changed tothe list of the modes of the individualinputs, e.g."voice ink".

Instead of the unification operation, for a specific applicationsemantics, integration could be achieved using some other algorithmor script. The benefit of using the unification semantics foremma:hook is that it provides a general purposemechanism for checking the compatibility of elements and fusingthem, whatever the specific elements are in the applicationspecific semantic representation.

The benefit of using theemma:hook annotation forauthors is that it provides an application independent method forindicating where integration with content from another mode isrequired. If a general purpose integration mechanism is used, suchas the unification approach described above, authors should be ableto use the same integration mechanism for a range of differentapplications without having to change the integration rules orlogic. For each application the speech grammar rules [SRGS]need to assignemma:hook to the appropriate elementsin the semantic representation of the speech. The general purposemultimodal integration mechanism will use theemma:hook annotations in order to determine where toadd in content from other modes. Another benefit of theemma:hook mechanism is that it facilitatesinteroperability among different multimodal integration components,so long as they are all general purpose and utilizeemma:hook in order to determine where to integratecontent.

The following provides a more detailed example of the use of theemma:hook annotation. In this example, spoken input iscombined with twoink gestures. The semanticrepresentation assigned to the spoken input "send this file tothis" indicates two locations where content is required from inkinput usingemma:hook="ink":

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:interpretationid="voice2"      emma:medium="acoustic"      emma:mode="voice"      emma:tokens="send this file to this"      emma:start="1087995961500"      emma:end="1087995963542">    <command>      <action>send</action>        <arg1>          <object emma:hook="ink">            <type>file</type>            <number>1</number>          </object>        </arg1>       <arg2>         <object emma:hook="ink">           <number>1</number>         </object>       </arg2>    </command>  </emma:interpretation></emma:emma>

The user gesturing on the two locations on the display can berepresented usingemma:sequence:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example">  <emma:sequenceid="ink2">    <emma:interpretationemma:start="1087995960500"      emma:end="1087995960900"      emma:medium="tactile"      emma:mode="ink">      <object>       <type>file</type>       <number>1</number>       <id>test.pdf</id>      <object>    </emma:interpretation>    <emma:interpretationemma:start="1087995961000"      emma:end="1087995961100"      emma:medium="tactile"      emma:mode="ink">      <object>        <type>printer</type>        <number>1</number>        <id>lpt1</id>      <object>    </emma:interpretation>  </emma:sequence></emma:emma>

A general purpose unification-based multimodal integrationalgorithm could use theemma:hook annotation asfollows. It identifies the elements marked withemma:hook in document order. For each of those inturn, it attempts to unify the element with the correspondingelement in order in theemma:sequence. Since none ofthe subelements conflict, the unification goes through and as aresult, we have the following EMMA for the composite result:

<emma:emma version="1.1"    xmlns:emma="http://www.w3.org/2003/04/emma"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2003/04/emma     http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"    xmlns="http://www.example.com/example"><emma:interpretationid="multimodal2"      emma:medium="acoustic tactile"      emma:mode="voice ink"      emma:tokens="send this file to this"      emma:process="http://example.com/myintegration.xml"      emma:start="1087995960500"      emma:end="1087995963542">  <emma:derived-from resource="http://example.com/voice2.emma/#voice2" composite="true"/>  <emma:derived-from resource="http://example.com/ink2.emma/#ink2" composite="true"/>  <command>   <action>send</action>    <arg1>     <object>       <type>file</type>       <number>1</number>        <id>test.pdf</id>     </object>    </arg1>    <arg2>     <object>       <type>printer</type>        <number>1</number>       <id>lpt1</id>     </object>    </arg2>  </command></emma:interpretation></emma:emma>

Appendix D. EMMA event interface

This section isInformative.

The W3C Document Object Model [DOM]defines platform and language neutral interfaces that givesprograms and scripts the means to dynamically access and update thecontent, structure and style of documents. DOM Events define ageneric event system which allows registration of event handlers,describes event flow through a tree structure, and provides basiccontextual information for each event.

This section of the EMMA specification extends the DOM Eventinterface for use with events that describe interpreted user inputin terms of a DOM Node for an EMMA document.

// File: emma.idl#ifndef _EMMA_IDL_#define _EMMA_IDL_#include "dom.idl"#include "views.idl"#include "events.idl"#pragma prefix "dom.w3c.org"module emma{  typedef dom::DOMString DOMString;  typedef dom::Node Node;  interface EMMAEvent : events::UIEvent {    readonly attribute dom::Node  node;    void               initEMMAEvent(in DOMString typeArg,                                   in boolean canBubbleArg,                                   in boolean cancelableArg,                                   in Node node);  };};#endif // _EMMA_IDL_

Appendix E. References

E.1 Normative references

BCP47
A. Phillips and M. Davis, editors.Tags for theIdentification of Languages, IETF, September 2006.
RFC3023
M. Murata et al., editors.XML Media Types.IETF RFC 3023, January 2001.
RFC2046
N. Freed and N. Borenstein, editors.Multipurpose InternetMail Extensions (MIME) Part Two: Media Types. IETF RFC2046, November 1996.
RFC2119
S. Bradner,editor.Key words for use inRFCs to Indicate Requirement Levels, IETFRFC2119, March 1997.
RFC3986
T. Berners-Lee et al., editors.Uniform ResourceIdentifier (URI): Generic Syntax. IETF RFC 3986, January2005.
RFC3987
M. Duerst and M. Suignard, editors.InternationalizedResource Identifiers (IRIs). IETF RFC 3987, January2005.
XML
Tim Brayet al., editors.ExtensibleMarkup Language (XML) 1.1. World Wide Web Consortium,W3CRecommendation, 2004.
XMLNS
Tim Brayet al., editors.Namespaces in XML 1.1,World Wide Web Consortium,W3C Recommendation,2006.
XML Schema Structures
Henry S. Thompsonet al., editors.XML Schema Part 1:Structures Second Edition, World Wide Web Consortium, W3CRecommendation, 2004.
XML Schema Datatypes
Paul V. Bironand Ashok Malhotra, editors.XML Schema Part 2:Datatypes Second Edition, World Wide Web Consortium,W3CRecommendation, 2004.

E.2 Informative references

DOM
Document Object Model,World Wide Web Consortium, 2005.
ECMAScript
ECMAScript
INKML
Stephen M. Watt and Tom Underhill, editors.Ink Markup Language (InkML),World Wide Web Consortium, W3C Recommendation 2011.
JSON
D. Crockford.Theapplication/json Media Type for JavaScript Object Notation. RFC4627. IETF Network Working Group Memo.
SISR
Luc Van Tichelenand Dave Burke,editors.SemanticInterpretation for Speech Recognition, World Wide WebConsortium,W3C Proposed Recommendation, 2007.
SRGS
Andrew Hunt, Scott McGlashan, editors.Speech RecognitionGrammar Specification Version 1.0, World Wide WebConsortium, W3C Recommendation, 2004.
XFORMS
John M. Boyer et al., editors.XForms1.0 (Second Edition), World Wide Web Consortium,W3C Recommendation, 2006.
RELAX-NG
James Clark and Makoto Murata, editors.RELAX NG Specification, OASIS, CommitteeSpecification, 2001.
EMMA Requirements
Stephane H. Maes and Stephen Potter, editors.Requirements for EMMA,World Wide Web Consortium,W3C Note,2003.
Graph Unification
Bob Carpenter.The Logic of Typed FeatureStructures, Cambridge Tracts in Theoretical Computer Science32, Cambridge University Press, 1992.
Kevin Knight.Unification: A MultidisciplinarySurvey, ACM Computing Surveys, 21(1), 1989.
Michael Johnston.Unification-based MultimodalParsing, Proceedings of Association for ComputationalLinguistics, pp. 624-630, 1998.
MMI Framework
James A. Larson, T.V. Raman and Dave Raggett, editors.W3C MultimodalInteraction Framework, World Wide Web Consortium, W3CNote, 2003.
MMI Requirements
Stephane H. Maes and Vijay Saraswat, editors.Multimodal InteractionRequirements, World Wide Web Consortium, W3C Note,2003.
Emotion ML
Mark Schroeder, editor.Emotion Markup Language(Emotion ML ) 1.0. World Wide Web Consortium, W3C Last CallWorking Draft, 2011.
EMMA Use Cases
Michael Johnston, editor.Use Cases for PossibleFuture EMMA Features World Wide Web Consortium, W3C Note,2009.
[Geolocation]
Geolocation APISpecification World Wide Web Consortium Proposed Recommendation 10May 2012. See http://www.w3.org/TR/geolocation-API/
[WGS84]
National Imagery and Mapping Agency Technical Report 8350.2, ThirdEdition. National Imagery and Mapping Agency, 3 January 2000.Seehttp://earth-info.nga.mil/GandG/publications/tr8350.2/wgs84fin.pdf

Appendix F. Changes since EMMA 1.0

This section isInformative.

Since the publication of the EMMA 1.0 Recommendation, thefollowing changes have been made.

Appendix G. Acknowledgements

This section isInformative.

The editors would like to recognize the contributions of thecurrent and former members of the W3C Multimodal Interaction Group(listed in alphabetical order):

Kazuyuki Ashimura, W3C
Patrizio Bergallo, (until 2008, while at Loquendo)
Jerry Carter (while at Nuance Communications)
Wu Chou, Avaya
Max Froumentin, (until 2006, while at W3C)
Katriina Halonen, Nokia
Jin Liu, T-Systems
Gerry McCobb, Openstream
Roberto Pieraccini, (while at Speechcycle)
Stephen Potter, (while at Microsoft)
Dave Raggett, (until 2007, while at Volantis and Canon)
Massimo Romanelli, DFKI
Yuan Shao, Canon
 
 
 

[8]ページ先頭

©2009-2025 Movatter.jp