Copyright© 2013W3C® (MIT,ERCIM,Keio,Beihang), All Rights Reserved.W3Cliability,trademark anddocumentuse rules apply.
The W3C Multimodal Interaction Working Group aims to developspecifications to enable access to the Web using multimodalinteraction. This document is part of a set of specifications formultimodal systems, and provides details of an XML markup languagefor containing and annotating the interpretation of user input.Examples of interpretation of user input are a transcription intowords of a raw signal, for instance derived from speech, pen orkeystroke input, a set of attribute/value pairs describing theirmeaning, or a set of attribute/value pairs describing a gesture.The interpretation of the user's input is expected to be generatedby signal interpretation processes, such as speech and inkrecognition, semantic interpreters, and other types of processorsfor use by components that act on the user's inputs such asinteraction managers.
This section describes the status of this document at thetime of its publication. Other documents may supersede thisdocument. A list of current W3C publications and the latestrevision of this technical report can be found in theW3C technical reports index athttp://www.w3.org/TR/.
This is the 27 June 2013 Second Public Working Draft of "EMMA:Extensible MultiModal Annotation markup language Version 1.1". Ithas been produced by theMultimodal Interaction WorkingGroup, which is part of theMultimodalInteraction Activity.
This specification describes markup for representinginterpretations of user input (speech, keystrokes, pen input etc.)together with annotations for confidence scores, timestamps, inputmedium etc., and forms part of the proposals for theW3C MultimodalInteraction Framework.
TheEMMA: ExtensibleMultimodal Annotation 1.0 specification was published as a W3CRecommendation in February 2009. Since then there have beennumerous implementations of the standard and extensive feedback hascome in regarding desired new features and clarifications requestedfor existing features. The W3C Multimodal Interaction Working Groupexamined a range of different use cases for extensions of the EMMAspecification and published a W3C Note on Use Cases for PossibleFuture EMMA Features [EMMA Use Cases]. In this working draft of EMMA 1.1, we havedeveloped a set of new features based on feedback from implementersand have also added clarification text in a number of placesthroughout the specification. The new features include: support foradding human annotations (emma:annotation
,emma:annotated-tokens
), support for inlinespecification of process parameters (emma:parameters
,emma:parameter
,emma:parameter-ref
),support for specification of models used in processing beyondgrammars (emma:process-model
,emma:process-model-ref
), extensions toemma:grammar
to enable inline specification ofgrammars, a new mechanism for indicating which grammars are active(emma:grammar-active
,emma:active
),support for non-XML semantic payloads(emma:result-format
), support for multipleemma:info
elements and reference to theemma:info
relevant to an interpretation(emma:info-ref
), and a new attribute to complement theemma:medium
andemma:mode
attributes thatenables specification of the modality used to express an input(emma:expressed-through
).
The changes from the last working draft are:
emma:location
element was added forspecification of the location of the device or sensor which capturedthe input.ref
attribute was added to a number of elementsallowing for shorter EMMA documents which use URIs to point to contentstored outside of the document:emma:one-of
,emma:sequence
,emma:group
,emma:info
,emma:parameters
,emma:lattice
.emma:partial-content
is introducedwhich indicates whether the content in an elementwithref
, is the full content or whether it is partialand more can be retrieved by following the URIinref
.emma:emma
element is extendedwithdoc-ref
andprev-doc
attributes thatindicate where the document can be retrieved from and where theprevious document in a sequence of inputs can be retrievedfrom.emma:lattice
is also extended sothat an EMMA document can contain both a N-best and a latticeside-by-side.Also changes from EMMA 1.0 can be found inAppendix F.
Comments are welcome onwww-multimodal@w3.org (archive).SeeW3C mailing list and archiveusage guidelines.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the5February 2004 W3C Patent Policy. W3C maintains apublic listof any patent disclosures made in connection with thedeliverables of the group; that page also includes instructions fordisclosing a patent. An individual who has actual knowledge of apatent which the individual believes containsEssential Claim(s) must disclose the information in accordancewithsection 6 of the W3C Patent Policy.
The sections in the main body of this document are normativeunless otherwise specified. The appendices in this document areinformative unless otherwise indicated explicitly.
All sections in this specification are normative, unlessotherwise indicated. The informative parts of this specificationare identified by "Informative" labels within sections.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALLNOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"in this document are to be interpreted as described in [RFC2119].
emma:model
elementemma:derived-from
element andemma:derivation
elementemma:grammar
elementemma:grammar-active
elementemma:info
elementemma:endpoint-info
element andemma:endpoint
elementemma:process-model
elementemma:parameters
andemma:parameter
elementsemma:annotation
elementemma:location
elementemma:tokens
attributeemma:process
attributeemma:no-input
attributeemma:uninterpreted
attributeemma:lang
attributeemma:signal
andemma:signal-size
attributesemma:media-type
attributeemma:confidence
attributeemma:source
attributeemma:medium
,emma:mode
,emma:function
,emma:verbal,emma:device-type,
andemma:expressed-through
attributesemma:hook
attributeemma:cost
attributeemma:endpoint-role
,emma:endpoint-address
,emma:port-type
,emma:port-num
,emma:message-id
,emma:service-name
,emma:endpoint-pair-ref
,emma:endpoint-info-ref
attributesemma:grammar
element:emma:grammar-ref
attributeemma:model
element:emma:model-ref
attributeemma:dialog-turn
attributeemma:result-format
attributeemma:info
element:emma:info-ref
attributeemma:process-model
element:emma:process-model-ref
attributeemma:parameters
element:emma:parameter-ref
attributeemma:annotated-tokens
attributeemma:partial-content
emma:hook
and SRGS(Informative)This section isInformative.
This document presents an XML specification for EMMA, anExtensible MultiModal Annotation markup language, responding to therequirements documented inRequirements for EMMA [EMMARequirements]. This markup language is intendedfor use by systems that provide semantic interpretations for avariety of inputs, including but not necessarily limited to,speech, natural language text, GUI and ink input.
It is expected that this markup will be used primarily as astandard data interchange format between the components of amultimodal system; in particular, it will normally be automaticallygenerated by interpretation components to represent the semanticsof users' inputs, not directly authored by developers.
The language is focused on annotating single inputs from users,which may be either from a single mode or a composite inputcombining information from multiple modes, as opposed toinformation that might have been collected over multiple turns of adialog. The language provides a set of elements and attributes thatare focused on enabling annotations on user inputs andinterpretations of those inputs.
An EMMA document can be considered to hold three types ofdata:
instance data
Application-specific markup corresponding to input informationwhich is meaningful to the consumer of an EMMA document. Instancesare application-specific and built by input processors at runtime.Given that utterances may be ambiguous with respect to inputvalues, an EMMA document may hold more than one instance.
data model
Constraints on structure and content of an instance. The datamodel is typically pre-established by an application, and may beimplicit, that is, unspecified.
metadata
Annotations associated with the data contained in the instance.Annotation values are added by input processors at runtime.In EMMA 1.1 annotations may also result from transcriptionand other activities by human annotators.
Given the assumptions above about the nature of data representedin an EMMA document, the following general principles apply to thedesign of EMMA:
emma:result-format
attribute. EMMA will remain agnostic to the specific details of theformat (If it is XML, the instance data is assumed to besufficiently structured to enable the association of annotativedata.)emma:info
element(Section4.1.5).The annotations of EMMA should be considered 'normative' in thesense that if an EMMA component produces annotations as describedinSection3andSection4, these annotations must be represented using the EMMAsyntax. The Multimodal Interaction Working Group may address inlater drafts the issues of modularization and profiling; that is,which sets of annotations are to be supported by which classes ofEMMA component.
The general purpose of EMMA is to represent informationautomatically extracted from a user's input by an interpretationcomponent, where input is to be taken in the general sense of ameaningful user input in any modality supported by the platform.The reader should refer to the sample architecture inW3CMultimodal Interaction Framework[MMIFramework], which shows EMMA conveying content betweenuser input modality components and an interaction manager.
Components that generate EMMA markup:
Components that use EMMA include:
Although not a primary goal of EMMA, a platform may also chooseto use this general format as the basis of a general semanticresult that is carried along and filled out during each stage ofprocessing. In addition, future systems may also potentially makeuse of this markup to convey abstract semantic content to berendered into natural language by a natural language generationcomponent.
emma:time-ref-uri
,emma:time-ref-anchor-point
allows you to specifywhether the referenced anchor is the start or end of theinterval.anyURI
primitive as defined in XML Schema Part 2:Datatypes Second Edition Section 3.2.17 [SCHEMA2].This section isInformative.
As noted above, the main components of an interpreted user inputin EMMA are the instance data, an optional data model, and themetadata annotations that may be applied to that input. Therealization of these components in EMMA is as follows:
An EMMAinterpretation is the primary unit for holdinguser input as interpreted by an EMMA processor. As will be seenbelow, multiple interpretations of a single input are possible.
EMMA provides a simple structural syntax for the organization ofinterpretations and instances, and an annotative syntax to applythe annotation to the input data at different levels.
An outline of the structural syntax and annotations found inEMMA documents is as follows. A fuller definition may be found inthe description of individual elements and attributes inSection3 andSection4.
emma:emma
element, holds EMMA version and namespaceinformation, and provides a container for one or more of thefollowing interpretation and container elements (Section3.1)emma:interpretation
elementcontains a given interpretation of the input and holds applicationspecific markup (Section3.2)emma:one-of
is a container for one or moreinterpretation elements or container elements and denotes thatthese are mutually exclusive interpretations (Section3.3.1)emma:group
is a general container for one or moreinterpretation elements or container elements. It can be associatedwith arbitrary grouping criteria (Section3.3.2).emma:sequence
is a container for one or moreinterpretation elements or container elements and denotes thatthese are sequential in time (Section3.3.3).emma:lattice
element is used tocontain a series ofemma:arc
andemma:node
elements that define a lattice of words,gestures, meanings or other symbols. Theemma:lattice
element appears within theemma:interpretation
element(Section3.4)emma:literal
element is used as awrapper when the application semantics is a string literal. (Section3.5)emma:derived-from
,emma:endpoint-info
,andemma:info
which are represented as elements sothat they can occur more than once within an element and cancontain internal structure. (Section4.1)emma:start
,emma:end
,emma:confidence
, andemma:tokens
whichare represented as attributes. They can appear onemma:interpretation
elements. Some canappear on container elements, lattice elements, and elements in theapplication-specific markup. (Section4.2)From the defined root nodeemma:emma
the structureof an EMMA document consists of a tree of EMMA container elements(emma:one-of
,emma:sequence
,emma:group
) terminating in a number of interpretationelements (emma:interpretation
). Theemma:interpretation
elements serve as wrappers foreither application namespace markup describing the interpretationof the users input or anemma:lattice
element oremma:literal
element . A singleemma:interpretation
may also appear directly under theroot node.
The EMMA elementsemma:emma
,emma:interpretation
,emma:one-of
, andemma:literal
and the EMMA attributesemma:no-input
,emma:uninterpreted
,emma:medium
, andemma:mode
are requiredof all implementations. The remaining elements and attributes areoptional and may be used in some implementations and not otherdepending on the specific modalities and processing beingrepresented.
To illustrate this, here is an example of an EMMA documentrepresenting input to a flight reservation application. In thisexample there are two speech recognition results and associatedsemantic representations of the input. The system is uncertainwhether the user meant "flights from Boston to Denver" or "flightsfrom Austin to Denver". The annotations to be captured aretimestamps and confidence scores for the two inputs.
Example:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-of emma:start="1087995961542" emma:end="1087995963542" emma:medium="acoustic" emma:mode="voice"> <emma:interpretation emma:confidence="0.75" emma:tokens="flights from boston to denver"> <origin>Boston</origin> <destination>Denver</destination> </emma:interpretation> <emma:interpretation emma:confidence="0.68" emma:tokens="flights from austin to denver"> <origin>Austin</origin> <destination>Denver</destination> </emma:interpretation> </emma:one-of></emma:emma>
Attributes on the rootemma:emma
element indicatethe version and namespace. Theemma:emma
elementcontains anemma:one-of
element which contains adisjunctive list of possible interpretations of the input. Theactual semantic representation of each interpretation is within theapplication namespace. In the example here the application specificsemantics involves elementsorigin
anddestination
indicating the origin and destinationcities for looking up a flight. The timestamp is the same for bothinterpretations and it is annotated using values in milliseconds intheemma:start
andemma:end
attributes ontheemma:one-of
. The confidence scores and tokensassociated with each of the inputs are annotated using the EMMAannotation attributesemma:confidence
andemma:tokens
on each of theemma:interpretation
elements.
Attributes in EMMA cascade from a containingemma:one-of
element to the individual interpretations.In the example above, theemma:start
,emma:end
,emma:medium
, andemma:mode
attributes are all specified once onemma:one-of
but apply to both of the containedemma:interpretation
elements. This is an importantmechanism as it limits the need to repeat annotations. More detailson the scope of annotations among EMMA structural elements, andalso on the scope of annotations within derivations, where multipledifferent processing stages apply to an input, can be found inSection4.3.
Many EMMA elements allow for content to bespecified either inline or by reference using theref
attribute. This is an important mechanism as it allows for EMMAdocuments to be less verbose and yet allows the EMMA consumer toaccess content from an external document, possibly on a remoteserver. For example, in the case ofemma:grammar
agrammar can either be specified inline within the element or theref
attribute onemma:grammar
canindicate the location where the grammar document can be retrieved.Similarly withemma:model
a data model can bespecified inline or by reference through theref
attribute. Aref
attribute can also be used on theEMMA container elementsemma:sequence
,emma:one-of
,emma:group
, andemma:lattice
. In these cases, theref
attribute provides a pointer to a portion of an external EMMAdocument, possibly on a remote server. This can be achieved usingURI ID references to pick out a particular element within theexternal EMMA document. One use case forref
with thecontainer elements is to allow for inline content to be partial andfor theref
to provide access to the full content. Forexample, in the case ofemma:one-of
, an EMMA documentdelivered to an EMMA consumer could contain an abbreviated list ofinterpretations, e.g. the top 3, while anemma:one-of
element accessible through the URI in ref to include a moreinclusive list of 20emma:interpretation
elements. Theemma:partial-content
attribute MUST be used on thepartially specified element if theref
refers to amore fully specified element. Theemma:ref
attributecan also be used onemma:info
,emma:parameters
, andemma:annotation
.The use ofref
on specific elements is described and exemplifiedin the specific section describing each element.
An EMMA data model expresses the constraints on the structureand content of instance data, for the purposes of validation. Assuch, the data model may be considered as a particular kind ofannotation (although, unlike other EMMA annotations, it is not afeature pertainingto a specific user input at aspecific moment in time, it is rather a static and, by its verydefinition, application-specific structure).Thespecification ofa data model in EMMA is optional.
Since Web applications today use different formats to specifydata models, e.g.XML Schema Part 1: Structures SecondEdition [XML SchemaStructures], XForms1.0 (SecondEdition) [XFORMS],RELAX NG Specification [RELAX-NG],etc., EMMA itself is agnostic to the format of data model used.
Data model definition and reference is defined inSection4.1.1.
An EMMA attribute is qualified with the EMMA namespace prefix ifthe attribute can also be used as an in-line annotation on elementsin the application's namespace. Most of the EMMA annotationattributes inSection4.2 are in this category. An EMMA attribute is not qualifiedwith the EMMA namespace prefix if the attribute only appears on anEMMA element. This rule ensures consistent usage of the attributesacross all examples.
Attributes from other namespaces are permissible on all EMMAelements. As an examplexml:lang
may be used toannotate the human language of character data content.
This section defines elements in the EMMA namespace whichprovide the structural syntax of EMMA documents.
emma:emma
Annotation | emma:emma |
---|---|
Definition | The root element of an EMMA document. |
Children | Theemma:emma element MUST immediately contain asingleemma:interpretation element or EMMA containerelement:emma:one-of ,emma:group ,emma:sequence . It MAY also contain an optional singleemma:derivation element. It MAY also contain multipleoptionalemma:grammar elements,emma:model elements, andemma:endpoint-info elements,emma:info elements,emma:process-model elements,emma:parameters elements, andemma:annotation elements.Itmay also contain a singleemma:location element. |
Attributes |
|
Applies to | None |
The root element of an EMMA document is namedemma:emma
. It holds a singleemma:interpretation
or EMMA container element(emma:one-of
,emma:sequence
,emma:group
). It MAY also contain a singleemma:derivation
element containing earlier stages ofthe processing of the input (SeeSection4.1.2). It MAY also contain multiple optionalemma:grammar
,emma:model
, andemma:endpoint-info
,emma:info
,emma:process-model
,emma:parameters
, andemma:annotation
elements.
It MAY hold attributes for information pertaining to EMMAitself, along with any namespaces which are declared for the entiredocument, and any other EMMA annotative data. Theemma:emma
element and other elements and attributesdefined in this specification belong to the XML namespaceidentified by the URI "http://www.w3.org/2003/04/emma". In theexamples, the EMMA namespace is generally declared using theattributexmlns:emma
on the rootemma:emma
element. EMMA processors MUST support thefull range of ways of declaring XML namespaces as defined by theNamespaces in XML 1.1 (Second Edition) [XMLNS].Application markupMAYMUST bedeclaredeither in an explicit application namespace,or an undefined namespace by setting xmlns="".
For example:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma"> ....</emma:emma>
or
<emma version="1.1" xmlns="http://www.w3.org/2003/04/emma"> ....</emma>
The optional attributesdoc-ref
andprev-doc
MAY be used onemma:emma
inorder to indicate the location where the EMMA document comprisingthatemma:emma
element can be retrieved from, and thelocation of the previous EMMA document in a sequence ofinteractions. One important use case fordoc-ref
isfor client side logging. A client receiving an EMMA document canrecord the URI found indoc-ref
in a log file insteadof a local copy of the whole EMMA document. Theprev-doc
attribute provides a mechanism for tracking asequence of EMMA documents representing the results of processingdistinct turns of interaction by an EMMA processor.
In the following example,doc-ref
onEMMA provides a URI which indicates where the EMMA documentembodied in thisemma:emma
can be retrieved from.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example" doc-ref="http://example.com/trainapp/user123/emma0727080512.xml"> <emma:interpretation emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:signal="http://example.com/audio/input678.amr" emma:process="http://example.com/asr/params.xml" emma:tokens="trains to london tomorrow"> <destination>London</destination> <date>tomorrow</date> </emma:interpretation></emma:emma>
In the following example, againdoc-ref
indicates where the EMMA document can be retrieved from but inadditionprev-doc
indicates where the previous EMMAdocument can be retrieved from.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example" doc-ref="http://example.com/trainapp/user123/emma0730080512.xml" prev-doc="http://example.com/trainapp/user123/emma0727080512.xml"> <emma:interpretation emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:signal="http://example.com/audio/input679.amr" emma:process="http://example.com/asr/params.xml" emma:tokens="from cambridge"> <origin>Cambridge</origin> </emma:interpretation></emma:emma>
EMMA processors may be use a number of differenttechniques to determine theprev-doc
. It may, forexample, be determined based on the session. In a session ofinteraction a server processing requests for processing can trackthe previous EMMA result for a client and indicate that inprev-doc
. Alternatively, the URI of the last EMMAresult could be passed in as a parameter in a request to an EMMAprocessor and returned in theprev-doc
with the nextresult.emma:interpretation
Annotation | emma:interpretation |
---|---|
Definition | Theemma:interpretation element acts as a wrapperfor application instance data or lattices. |
Children | Theemma:interpretation element MUST immediatelycontain either application instance data, or a singleemma:lattice element, or a singleemma:literal element, or in the case of uninterpretedinput or no inputemma:interpretation MUST be empty. It MAY also containmultipleoptionalemma:derived-from elements andan optional singleemma:info element.It MAY alsocontain multiple optionalemma:annotation elements. ItMAY also contain multipleemma:parameters elements. ItMAY also contain a single optionalemma:grammar-active element.It may also contain a singleemma:location element. |
Attributes |
|
Applies to | Theemma:interpretation element is legal only as achild ofemma:emma ,emma:group ,emma:one-of ,emma:sequence , oremma:derivation . |
Theemma:interpretation
element holds a singleinterpretation represented in application specific markup, or asingleemma:lattice
element, or a singleemma:literal
element.
Theemma:interpretation
element MUST be empty if itis marked withemma:no-input="true"
(Section4.2.3). Theemma:interpretation
elementMUST be empty if it has been annotated withemma:uninterpreted="true"
(Section4.2.4) oremma:function="recording"
(Section4.2.11).
Attributes:
xsd:ID
value that uniquelyidentifies the interpretation within the EMMA document.<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:medium="acoustic" emma:mode="voice"> ... </emma:interpretation></emma:emma>
Whileemma:medium
andemma:mode
areoptional onemma:interpretation
, note that all EMMAinterpretations must be annotated foremma:medium
andemma:mode
, so either these attributes must appeardirectly onemma:interpretation
or they must appear onan ancestoremma:one-of
node or they must appear on anearlier stage of the derivation listed inemma:derivation
.
emma:one-of
elementAnnotation | emma:one-of |
---|---|
Definition | A container element indicating a disjunction among a collectionof mutually exclusive interpretations of the input. |
Children | Theemma:one-of element MUST immediately contain acollection of one or moreemma:interpretation elementsor container elements:emma:one-of ,emma:group ,emma:sequence ref . ItMAY also containmultiple optionalemma:derived-from elements and multipleemma:info elements.ItMAY also contain multiple optionalemma:annotation elements. It MAY also contain multiple optionalemma:parameters elements. It MAY also contain a singleoptionalemma:grammar-active element.emma:lattice element containing the lattice result for the same input.emma:location element. |
Attributes |
|
Applies to | Theemma:one-of element MAY only appear as a childofemma:emma ,emma:one-of ,emma:group ,emma:sequence , oremma:derivation . |
Theemma:one-of
element acts as a container for acollection of one or more interpretation(emma:interpretation
) or container elements(emma:one-of
,emma:group
,emma:sequence
), and denotes that these are mutuallyexclusive interpretations.
An N-best list of choices in EMMA MUST be represented as a setofemma:interpretation
elements contained within anemma:one-of
element. For instance, a series ofdifferent recognition results in speech recognition might berepresented in this way.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-ofemma:medium="acoustic" emma:mode="voice"ref="http://www.example.com/i156/emma.xml#r1 > <emma:interpretation> <origin>Boston</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> <emma:interpretation> <origin>Austin</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> </emma:one-of></emma:emma>
The function of theemma:one-of
element is torepresent a disjunctive list of possible interpretations of a userinput. A disjunction of possible interpretations of an input can bethe result of different kinds of processing or ambiguity. Onesource is multiple results from a recognition technology such asspeech or handwriting recognition. Multiple results can also occurfrom parsing or understanding natural language. Another possiblesource of ambiguity is from the application of multiple differentkinds of recognition or understanding components to the same inputsignal. For example, an single ink input signal might be processedby both handwriting recognition and gesture recognition. Another isthe use of more than one recording device for the same input(multiple microphones).
The optionalref
attribute indicates alocation where a copy of the content within theemma:one-of
element can be retrieved from an externaldocument, possibly located on a remote server.
In order to make explicit these different kinds of multipleinterpretations and allow for concise statement of the annotationsassociated with each, theemma:one-of
element MAYappear within anotheremma:one-of
element. Ifemma:one-of
elements are nested then they MUSTindicate the kind of disjunction using the attributedisjunction-type
. The values ofdisjunction-type
are{recognition,understanding, multi-device, and multi-process}
. For themost common use case, where there are multiple recognition resultsand some of them have multiple interpretations, the top-levelemma:one-of
isdisjunction-type="recognition"
and the embeddedemma:one-of
has the attributedisjunction-type="understanding"
.
As an example, in an interactive flight reservation application,recognition yielded 'Boston' or 'Austin' and each had a semanticinterpretation as either the assertion of city name or thespecification of a flight query with the city as the destination,this would be represented as follows in EMMA:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-of disjunction-type="recognition" start="12457990" end="12457995"emma:medium="acoustic" emma:mode="voice"> <emma:one-of disjunction-type="understanding" emma:tokens="boston"> <emma:interpretation> <assert><city>boston</city></assert> </emma:interpretation> <emma:interpretation> <flight><dest><city>boston</city></dest></flight> </emma:interpretation> </emma:one-of> <emma:one-of disjunction-type="understanding" emma:tokens="austin"> <emma:interpretation> <assert><city>austin</city></assert> </emma:interpretation> <emma:interpretation> <flight><dest><city>austin</city></dest></flight> </emma:interpretation> </emma:one-of> </emma:one-of></emma:emma>
EMMA MAY explicitly represent ambiguity resulting from differentprocesses, devices, or sources using embeddedemma:one-of
and thedisjunction-type
attribute. Multiple different interpretations resulting fromdifferent factors MAY also be listed within a single unstructuredemma:one-of
though in this case it is more complex orimpossible to uncover the sources of the ambiguity if required bylater stages of processing. If there is no embedding inemma:one-of
, then thedisjunction-type
attribute is not required. If thedisjunction-type
attribute is missing then by default the source of disjunction isunspecified.
The example case above could also be represented as:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-of start="12457990" end="12457995" emma:medium="acoustic" emma:mode="voice"> <emma:interpretation emma:tokens="boston"> <assert><city>boston</city></assert> </emma:interpretation> <emma:interpretation > <flight><dest><city>boston</city></dest></flight> </emma:interpretation> <emma:interpretation emma:tokens="austin"> <assert><city>austin</city></assert> </emma:interpretation> <emma:interpretation emma:tokens="austin"> <flight><dest><city>austin</city></dest></flight> </emma:interpretation> </emma:one-of></emma:emma>
But in this case information about which interpretationsresulted from speech recognition and which resulted from languageunderstanding is lost.
A list ofemma:interpretation
elements within anemma:one-of
MUST be sorted best-first by some measureof quality. The quality measure isemma:confidence
ifpresent, otherwise, the quality metric is platform-specific.
With embeddedemma:one-of
structures there is norequirement for the confidence scores within differentemma:one-of
to be on the same scale. For example, thescores assigned by handwriting recognition might not be comparableto those assigned by gesture recognition. Similarly, if multiplerecognizers are used there is no guarantee that their confidencescores will be comparable. For this reason the ordering requirementonemma:interpretation
withinemma:one-of
only applies locally to sisteremma:interpretation
elements within eachemma:one-of
. There is norequirement on the ordering of embeddedemma:one-of
elements within a higheremma:one-of
element.
Whileemma:medium
andemma:mode
areoptional onemma:one-of
, note that all EMMAinterpretations must be annotated foremma:medium
andemma:mode
, so either these annotations must appeardirectly on all of the containedemma:interpretation
elements within theemma:one-of
, or they must appearon theemma:one-of
element itself, or they must appearon an ancestoremma:one-of
element, or they mustappear on an earlier stage of the derivation listed inemma:derivation
.
An important use case forref
onemma:one-of
is to allow an EMMA processor to return anabbreviated list of container elements such asemma:interpretation
within anemma:one-of
and use theref
attribute to provide a reference to amore fully specified set. In these cases, theemma:one-of
MUST be annotated with theemma:partial-content="true"
attribute.
In the following example the EMMA document receivedhas the two interpretations withinemma:one-of
. Theemma:partial-content="true"
provides an indicationthat there are more interpretations and those can be retrieved byaccessing the URI inref
:"http://www.example.com/emma_021210_10.xml#r1"
.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-ofemma:medium="acoustic" emma:mode="voice"ref="http://www.example.com/emma_021210_10.xml#r1
emma:partial-content="true"> <emma:interpretation emma:tokens="from boston to denver" emma:confidence="0.9"> <origin>Boston</origin> <destination>Denver</destination> </emma:interpretation> <emma:interpretation emma:tokens="from austin to denver" emma:confidence="0.7"> <origin>Austin</origin> <destination>Denver</destination> </emma:interpretation> </emma:one-of></emma:emma>
Where the document at"http://www.example.com/emma_021210_10.xml" is as follows, andthere are two more interpretations within theemma:one-of
with id "r1".
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-ofemma:medium="acoustic" emma:mode="voice"
emma:partial-content="false"> <emma:interpretation emma:tokens="from boston to denver" emma:confidence="0.9"> <origin>Boston</origin> <destination>Denver</destination> </emma:interpretation> <emma:interpretation emma:tokens="from austin to denver" emma:confidence="0.7"> <origin>Austin</origin> <destination>Denver</destination> </emma:interpretation> <emma:interpretation emma:tokens="from tustin to denver" emma:confidence="0.3"> <origin>Tustin</origin> <destination>Denver</destination> </emma:interpretation> <emma:interpretation emma:tokens="from tustin to dallas" emma:confidence="0.1"> <origin>Tustin</origin> <destination>Dallas</destination> </emma:interpretation> </emma:one-of></emma:emma>
It is also possible to specify a lattice of resultsalongside an N-best list of interpretations inemma:one-of
. A singleemma:lattice
element can appear as a child ofemma:one-of
andcontains a lattice representation of the processing of the sameinput resulting in the interpretations that appear within theemma:one-of
. In this example, there are two N-bestresults and theemma:lattice
enumerates two more as itincludes arcs for "tomorrow" vs "today".
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-of emma:medium="acoustic" emma:mode="voice"> <emma:interpretation emma:tokens="flights from boston to denver tomorrow"> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> <emma:interpretation emma:tokens="flights from austin to denver tomorrow"> <origin>Austin</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> <emma:lattice initial="1" final="7"> <emma:arc from="1" to="2">flights</emma:arc> <emma:arc from="2" to="3">from</emma:arc> <emma:arc from="3" to="4">boston</emma:arc> <emma:arc from="3" to="4">austin</emma:arc> <emma:arc from="4" to="5">to</emma:arc> <emma:arc from="5" to="6">denver</emma:arc> <emma:arc from="6" to="7">today</emma:arc> <emma:arc from="6" to="7">tomorrow</emma:arc> </emma:lattice> </emma:one-of></emma:emma>
emma:group
elementAnnotation | emma:group |
---|---|
Definition | A container element indicating that a number of interpretationsof distinct user inputs are grouped according to somecriteria. |
Children | Theemma:group element MUST immediately contain acollection of one or moreemma:interpretation elementsor container elements:emma:one-of ,emma:group ,emma:sequence . It MAY alsocontain anoptional singleemma:group-info element. It MAY also containmultiple optionalemma:derived-from elements and multipleemma:info elements.It MAY also containmultiple optionalemma:annotation elements. It MAYalso contain multiple optionalemma:parameters elements. It MAY also contain a single optionalemma:grammar-active element.emma:location element. |
Attributes |
|
Applies to | Theemma:group element is legal only as a child ofemma:emma ,emma:one-of ,emma:group ,emma:sequence , oremma:derivation . |
Theemma:group
element is used to indicate that thecontained interpretations are from distinct user inputs that arerelated in some manner.emma:group
MUST NOT be usedfor containing the multiple stages of processing of a single userinput. Those MUST be contained in theemma:derivation
element instead(Section4.1.2). For groups of inputs in temporal order the morespecialized containeremma:sequence
MUST be used(Section3.3.3). The following example shows threeinterpretations derived from the speech input "Move this ambulancehere" and the tactile input related to two consecutive points on amap.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:group emma:start="1087995961542" emma:end="1087995964542"> <emma:interpretationemma:medium="acoustic" emma:mode="voice"> <action>move</action> <object>ambulance</object> <destination>here</destination> </emma:interpretation> <emma:interpretationemma:medium="tactile" emma:mode="ink"> <x>0.253</x> <y>0.124</y> </emma:interpretation> <emma:interpretationemma:medium="tactile" emma:mode="ink"> <x>0.866</x> <y>0.724</y> </emma:interpretation> </emma:group></emma:emma>
Theemma:one-of
andemma:group
containers MAY be nested arbitrarily.
Likeemma:one-of
the contents foremma:group
may be partial, indicated byemma:partial-content="true"
and the full set of groupmembers retrieved by accessing the element referenced inref
.
emma:group-info
elementAnnotation | emma:group-info |
---|---|
Definition | Theemma:group-info element contains or referencescriteria used in establishing the grouping of interpretations in anemma:group element. |
Children | Theemma:group-info element MUST eitherimmediately contain inline instance data specifying groupingcriteria or have the attributeref referencing thecriteria. |
Attributes |
|
Applies to | Theemma:group-info element is legal only as achild ofemma:group . |
Sometimes it may be convenient to indirectly associate a givengroup with information, such as grouping criteria. Theemma:group-info
element might be used to make explicitthe criteria by which members of a group are associated. In thefollowing example, a group of two points is associated with adescription of grouping criteria based upon a sliding temporalwindow of two seconds duration.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example" xmlns:ex="http://www.example.com/ns/group"> <emma:group> <emma:group-info> <ex:mode>temporal</ex:mode> <ex:duration>2s</ex:duration> </emma:group-info> <emma:interpretation emma:medium="tactile" emma:mode="ink"> <x>0.253</x> <y>0.124</y> </emma:interpretation> <emma:interpretationemma:medium="tactile" emma:mode="ink"> <x>0.866</x> <y>0.724</y> </emma:interpretation> </emma:group></emma:emma>
You might also useemma:group-info
to refer to anamed grouping criterion using external reference, forinstance:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example" xmlns:ex="http://www.example.com/ns/group"> <emma:group> <emma:group-info ref="http://www.example.com/criterion42"/> <emma:interpretationemma:medium="tactile" emma:mode="ink"> <x>0.253</x> <y>0.124</y> </emma:interpretation> <emma:interpretationemma:medium="tactile" emma:mode="ink"> <x>0.866</x> <y>0.724</y> </emma:interpretation> </emma:group></emma:emma>
emma:sequence
elementAnnotation | emma:sequence |
---|---|
Definition | A container element indicating that a number of interpretationsof distinct user inputs are in temporal sequence. |
Children | Theemma:sequence element MUST immediately containa collection of one or moreemma:interpretation elements or container elements:emma:one-of ,emma:group ,emma:sequence . It MAY alsocontainmultiple optionalemma:derived-from elements and multipleemma:info elements.ItMAY also contain multiple optionalemma:annotation elements. It MAY also contain multiple optionalemma:parameters elements. It MAY also contain a singleoptionalemma:grammar-active element.emma:location element. |
Attributes |
|
Applies to | Theemma:sequence element is legal only as a childofemma:emma ,emma:one-of ,emma:group ,emma:sequence , oremma:derivation . |
Theemma:sequence
element is used to indicate thatthe contained interpretations are sequential in time, as in thefollowing example, which indicates that two points made with a penare in temporal order.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:sequence> <emma:interpretationemma:medium="tactile" emma:mode="ink"> <x>0.253</x> <y>0.124</y> </emma:interpretation> <emma:interpretationemma:medium="tactile" emma:mode="ink"> <x>0.866</x> <y>0.724</y> </emma:interpretation> </emma:sequence></emma:emma>
Theemma:sequence
container MAY be combined withemma:one-of
andemma:group
in arbitrarynesting structures. The order of children in the content of theemma:sequence
element corresponds to a sequence ofinterpretations. This ordering does not imply any particulardefinition of sequentiality. EMMA processors are expected thereforeto use theemma:sequence
element to holdinterpretations which are either strictly sequential in nature(e.g. the end-time of an interpretation precedes the start-time ofits follower), or which overlap in some manner (e.g. the start-timeof a follower interpretation precedes the end-time of itsprecedent). It is possible to use timestamps to provide finegrained annotation for the sequence of interpretations that aresequential in time(seeSection4.2.10).
In the following more complex example, a sequence of two pengestures inemma:sequence
and a speech input inemma:interpretation
is contained in anemma:group
.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:group> <emma:interpretation emma:medium="acoustic" emma:mode="voice"> <action>move</action> <object>this-battleship</object> <destination>here</destination> </emma:interpretation> <emma:sequence> <emma:interpretation emma:medium="tactile" emma:mode="ink"> <x>0.253</x> <y>0.124</y> </emma:interpretation> <emma:interpretation emma:medium="tactile" emma:mode="ink"> <x>0.866</x> <y>0.724</y> </emma:interpretation> </emma:sequence> </emma:group></emma:emma>
Likeemma:one-of
the contents for emma:group may bepartial, indicated byemma:partial-content="true"
andthe full set of group members retrieved by accessing the elementreferenced inref
.
In addition to providing the ability to represent N-best listsof interpretations usingemma:one-of
, EMMA alsoprovides the capability to represent lattices of words or othersymbols using theemma:lattice
element. Latticesprovide a compact representation of large lists of possiblerecognition results or interpretations for speech, pen, ormultimodal inputs.
In addition to providing a representation for lattice outputfrom speech recognition, another important use case for lattices isfor representation of the results of gesture and handwritingrecognition from a pen modality component. Lattices can also beused to compactly represent multiple possible meaningrepresentations. Another use case for the lattice representation isfor associating confidence scores and other annotations withindividual words within a speech recognition result string.
Lattices are compactly described by a list of transitionsbetween nodes. For each transition the start and end nodes MUST bedefined, along with the label for the transition. Initial and finalnodes MUST also be indicated. The following figure provides agraphical representation of a speech recognition lattice whichcompactly represents eight different sequences of words.
which expands to:
a. flights to boston from portland today pleaseb. flights to austin from portland today pleasec. flights to boston from oakland today pleased. flights to austin from oakland today pleasee. flights to boston from portland tomorrowf. flights to austin from portland tomorrowg. flights to boston from oakland tomorrowh. flights to austin from oakland tomorrow
emma:lattice
,emma:arc
,emma:node
elementsAnnotation | emma:lattice |
---|---|
Definition | An element which encodes a lattice representation of userinput. |
Children | Theemma:lattice element MUST immediately containone or moreemma:arc elements and zero or moreemma:node elements. |
Attributes |
|
Applies to | Theemma:lattice element is legal only as a childof theemma:interpretation andemma:one-of elements. |
Annotation | emma:arc |
Definition | An element which encodes a transition between two nodes in alattice. The label associated with the arc in the lattice isrepresented in the content ofemma:arc . |
Children | Theemma:arc element MUST immediately containeither character data or a single application namespace element orbe empty, in the case of epsilon transitions. It MAY contain anemma:info element containing application or vendorspecific annotations.It MAY contain zero or more optionalemma:annotation elements containing annotations madeby a human annotator. |
Attributes |
|
Applies to | Theemma:arc element is legal only as a child oftheemma:lattice element. |
Annotation | emma:node |
Definition | An element which represents a node in the lattice. Theemma:node elements are not required to describe alattice but might be added to provide a location for annotations onnodes in a lattice. There MUST be at most oneemma:node specification for each numbered node in thelattice. |
Children | An OPTIONALemma:info element for application orvendor specific annotations on the node.It MAY contain zeroor more optionalemma:annotation elements containingannotations made by a human annotator. |
Attributes |
|
Applies to | Theemma:node element is legal only as a child oftheemma:lattice element. |
In EMMA, a lattice is represented using an elementemma:lattice
, which has attributesinitial
andfinal
for indicating theinitial and final nodes of the lattice. For the latticebelow, this will be:<emma:latticeinitial="1" final="8"/>
. The nodes are numbered withintegers. If there is more than one distinct final node in thelattice the nodes MUST be represented as a space separated list inthe value of thefinal
attribute e.g.<emma:lattice initial="1" final="9 10 23"/>
.There MUST only be one initial node in an EMMA lattice. Eachtransition in the lattice is represented as an elementemma:arc
with attributesfrom
andto
which indicate the nodes where the transitionstarts and ends. The arc's label is represented as the content oftheemma:arc
element and MUST be any well-formedcharacter or XML content. In the example here the contents arewords. Empty (epsilon) transitions in a lattice MUST be representedin theemma:lattice
representation asemma:arc
empty elements, e.g.<emma:arc from="1" to="8"/>
.
The example speech lattice above would be represented in EMMAmarkup as follows:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretationemma:medium="acoustic" emma:mode="voice"> <emma:lattice initial="1" final="8"> <emma:arc from="1" to="2">flights</emma:arc> <emma:arc from="2" to="3">to</emma:arc> <emma:arc from="3" to="4">boston</emma:arc> <emma:arc from="3" to="4">austin</emma:arc> <emma:arc from="4" to="5">from</emma:arc> <emma:arc from="5" to="6">portland</emma:arc> <emma:arc from="5" to="6">oakland</emma:arc> <emma:arc from="6" to="7">today</emma:arc> <emma:arc from="7" to="8">please</emma:arc> <emma:arc from="6" to="8">tomorrow</emma:arc> </emma:lattice> </emma:interpretation></emma:emma>
Alternatively, if we wish to represent the same information asan N-best list usingemma:one-of,
we would have themore verbose representation:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-ofemma:medium="acoustic" emma:mode="voice"> <emma:interpretation> <text>flights to boston from portland today please</text> </emma:interpretation> <emma:interpretationid="interp2"> <text>flights to boston from portland tomorrow</text> </emma:interpretation> <emma:interpretation> <text>flights to austin from portland today please</text> </emma:interpretation> <emma:interpretation> <text>flights to austin from portland tomorrow</text> </emma:interpretation> <emma:interpretation> <text>flights to boston from oakland today please</text> </emma:interpretation> <emma:interpretation> <text>flights to boston from oakland tomorrow</text> </emma:interpretation> <emma:interpretation> <text>flights to austin from oakland today please</text> </emma:interpretation> <emma:interpretation> <text>flights to austin from oakland tomorrow</text> </emma:interpretation> </emma:one-of></emma:emma>
The lattice representation avoids the need to enumerate all ofthe possible word sequences. Also, as detailed below, theemma:lattice
representation enables placement ofannotations on individual words in the input.
For use cases involving the representation of gesture/inklattices and use cases involving lattices of semanticinterpretations, EMMA allows for application namespace elements toappear withinemma:arc
.
For example a sequence of two gestures, each of which isrecognized as either a line or a circle, might berepresented as follows:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretationemma:medium="acoustic" emma:mode="voice"> <emma:lattice initial="1" final="3"> <emma:arc from="1" to="2"> <circle radius="100"/> </emma:arc> <emma:arc from="2" to="3"> <line length="628"/> </emma:arc> <emma:arc from="1" to="2"> <circle radius="200"/> </emma:arc> <emma:arc from="2" to="3"> <line length="1256"/> </emma:arc> </emma:lattice> </emma:interpretation></emma:emma>
As an example of a lattice of semantic interpretations, in atravel application where the source is either "Boston" or"Austin"and the destination is either "Newark" or "New York", thepossibilities might be represented in a lattice as follows:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretationemma:medium="acoustic" emma:mode="voice"> <emma:lattice initial="1" final="3"> <emma:arc from="1" to="2"> <source city="boston"/> </emma:arc> <emma:arc from="2" to="3"> <destination city="newark"/> </emma:arc> <emma:arc from="1" to="2"> <source city="austin"/> </emma:arc> <emma:arc from="2" to="3"> <destination city="new york"/> </emma:arc> </emma:lattice> </emma:interpretation></emma:emma>
Theemma:arc
element MAY contain either anapplication namespace element or character data. It MUST NOTcontain combinations of application namespace elements andcharacter data. However, anemma:info
element MAYappear within anemma:arc
element alongside characterdata, in order to allow for the association of vendor orapplication specific annotations on a single word or symbol in alattice. Also anemma:annotation
element may appear asa child ofemma:arc
oremma:node
indicating human annotations on the arc or node.
So, in summary, there are four groupings of content that canappear withinemma:arc
:
emma:info
elementproviding vendor or application specific annotations that apply tothe character data.emma:info
element providing vendor or applicationspecific annotations that apply to the character data.Theref
attribute onemma:lattice
canbe used for cases where the lattice is not returned in thedocument, but is made accessible throughref
, or forcases where the lattice is partial and a full lattice is availableon the server.
For example the followingemma:lattice
does notcontain anyemma:arc
elements butref
indicates where the lattice can retrieved from.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:medium="acoustic" emma:mode="voice" emma:tokens="flights to boston from oakland tomorrow"> <emma:lattice initial="1" final="8" emma:partial-content="true" ref="http://www.example.com/ex1/lattice.xml#l1"/> </emma:interpretation></emma:emma>The document on the server in this case could forexample be as follows.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:medium="acoustic" emma:mode="voice" emma:tokens="flights to boston from oakland tomorrow"> <emma:lattice initial="1" final="8" emma:partial-content="false"> <emma:arc from="1" to="2">flights</emma:arc> <emma:arc from="2" to="3">to</emma:arc> <emma:arc from="3" to="4">boston</emma:arc> <emma:arc from="3" to="4">austin</emma:arc> <emma:arc from="4" to="5">from</emma:arc> <emma:arc from="5" to="6">portland</emma:arc> <emma:arc from="5" to="6">oakland</emma:arc> <emma:arc from="6" to="7">today</emma:arc> <emma:arc from="7" to="8">please</emma:arc> <emma:arc from="6" to="8">tomorrow</emma:arc> </emma:lattice> </emma:interpretation></emma:emma>
Similarly theemma:lattice
could have some arcs butnot all and point to throughref
to the full lattice.In this case the EMMA document received is a pruned lattice and thefull lattice can be retrieved by accessing the external documentindicated inref
.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:medium="acoustic" emma:mode="voice"> <emma:lattice initial="1" final="8" emma:partial-content="true" ref="http://www.example.com/ex1/lattice.xml#l1"> <emma:arc from="1" to="2">flights</emma:arc> <emma:arc from="2" to="3">to</emma:arc> <emma:arc from="3" to="4">boston</emma:arc> <emma:arc from="4" to="5">from</emma:arc> <emma:arc from="5" to="6">portland</emma:arc> <emma:arc from="6" to="8">tomorrow</emma:arc> </emma:lattice> </emma:interpretation></emma:emma>
The encoding of lattice arcs as XML elements(emma:arc
) enables arcs to be annotated with metadatasuch as timestamps, costs, or confidence scores:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretationemma:medium="acoustic" emma:mode="voice"> <emma:lattice initial="1" final="8"> <emma:arc from="1" to="2" emma:start="1087995961542" emma:end="1087995962042" emma:cost="30"> flights<emma:annotation annotator="john_smith" time="2011-11-10T09:00:21" type="emotion" confidence="1.0" reference="false"> <emotionml xmlns="http://www.w3.org/2009/10/emotionml"> <emotion> <category set="everyday" name="angry"/>
<modality medium="acoustic" mode="voice"/> </emotion> </emotionml> </emma:annotation> </emma:arc> <emma:arc from="2" to="3" emma:start="1087995962042" emma:end="1087995962542" emma:cost="20"> to </emma:arc> <emma:arc from="3" to="4" emma:start="1087995962542" emma:end="1087995963042" emma:cost="50"> boston </emma:arc> <emma:arc from="3" to="4" emma:start="1087995963042" emma:end="1087995963742" emma:cost="60"> austin </emma:arc> ... </emma:lattice> </emma:interpretation></emma:emma>
The following EMMA attributes MAY be placed onemma:arc
elements: absolute timestamps(emma:start
,emma:end
), relativetimestamps (emma:offset-to-start
,emma:duration
),emma:confidence
,emma:cost
, the human language of the input(emma:lang
),emma:medium
,emma:mode
,emma:source
,andemma:annotated-tokens
. The use case foremma:medium
,emma:mode
, andemma:source
is for lattices which contains contentfrom different input modes. Theemma:arc
element MAYalso contain anemma:info
element for specification ofvendor and application specific annotations on the arc.Theemma:arc
andemma:node
elements can alsocontain optionalemma:annotation
elements containingannotations mae by human annotators. For example, in the exampleaboveemma:annotation
is used to indicate manualannotation of emotion on the word 'flights'.
The timestamps that appear onemma:arc
elements donot necessarily indicate the start and end of the arc itself. TheyMAY indicate the start and end of the signal corresponding to thelabel on the arc. As a result there is no requirement that theemma:end
timestamp on an arc going into a node shouldbe equivalent to theemma:start
of all arcs going outof that node. Furthermore there is no guarantee that the left toright order of arcs in a lattice will correspond to the temporalorder of the input signal. The lattice representation is anabstraction that represents a range of possible interpretations ofa user's input and is not intended to necessarily be arepresentation of temporal order.
Costs are typically application and device dependent. There area variety of ways that individual arc costs might be combined toproduce costs for specific paths through the lattice. Thisspecification does not standardize the way for these costs to becombined; it is up to the applications and devices to determine howsuch derived costs would be computed and used.
For some lattice formats, it is also desirable to annotate thenodes in the lattice themselves with information such as costs. Forexample in speech recognition, costs might be placed on nodes as aresult of word penalties or redistribution of costs. For thispurpose EMMA also provides anemma:node
element whichcan host annotations such asemma:cost
. Theemma:node
element MUST have an attributenode-number
which indicates the number of the node.There MUST be at most oneemma:node
specification fora given numbered node in the lattice. In our example, if there wasa cost of100 on the final state this could be representedas follows:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretationemma:medium="acoustic" emma:mode="voice"> <emma:lattice initial="1" final="8"> <emma:arc from="1" to="2" emma:start="1087995961542" emma:end="1087995962042" emma:cost="30"> flights </emma:arc> <emma:arc from="2" to="3" emma:start="1087995962042" emma:end="1087995962542" emma:cost="20"> to </emma:arc> <emma:arc from="3" to="4" emma:start="1087995962542" emma:end="1087995963042" emma:cost="50"> boston </emma:arc> <emma:arc from="3" to="4" emma:start="1087995963042" emma:end="1087995963742" emma:cost="60"> austin </emma:arc> ... <emma:node node-number="8" emma:cost="100"/> </emma:lattice> </emma:interpretation></emma:emma>
The relative timestamp mechanism in EMMA is intended to providetemporal information about arcs in a lattice in relative termsusing offsets in milliseconds. In order to do this the absolutetime MAY be specified onemma:interpretation
; bothemma:time-ref-uri
andemma:time-ref-anchor-point
apply toemma:lattice
and MAY be used there to set the anchorpoint for offsets to the start of the absolute time specified onemma:interpretation
. The offset in milliseconds to thebeginning of each arc MAY then be indicated on eachemma:arc
in theemma:offset-to-start
attribute.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:start="1087995961542" emma:end="1087995963042"emma:medium="acoustic" emma:mode="voice"> <emma:lattice emma:time-ref-uri="#interp1" emma:time-ref-anchor-point="start" initial="1" final="4"> <emma:arc from="1" to="2" emma:offset-to-start="0"> flights </emma:arc> <emma:arc from="2" to="3" emma:offset-to-start="500"> to </emma:arc> <emma:arc from="3" to="4" emma:offset-to-start="1000"> boston </emma:arc> </emma:lattice> </emma:interpretation></emma:emma>
Note that the offset for the firstemma:arc
MUSTalways be zero since the EMMA attributeemma:offset-to-start
indicates the number ofmilliseconds from the anchor point to thestart of the pieceof input associated with theemma:arc
, in this casethe word "flights".
emma:literal
elementAnnotation | emma:literal |
---|---|
Definition | An element that contains string literal output. |
Children | String literal |
Attributes | An optionalemma:result-format attribute. |
Applies to | Theemma:literal is a child ofemma:interpretation . |
Certain EMMA processing components produce semantic results inthe form of string literals without any surrounding applicationnamespace markup. These MUST be placed with the EMMA elementemma:literal
withinemma:interpretation
.For example, if a semantic interpreter simply returned "boston"this could be represented in EMMA as:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretationid="r1" emma:medium="acoustic" emma:mode="voice"> <emma:literal>boston</emma:literal> </emma:interpretation></emma:emma>
Note that a raw recognition result of a sequence of words fromspeech recognition is also a kind of string literal and can becontained withinemma:literal
. For example,recognition of the string "flights to san francisco" can berepresented in EMMA as follows:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretationid="r1" emma:medium="acoustic" emma:mode="voice"> <emma:literal>flights to san francisco</emma:literal> </emma:interpretation></emma:emma>
This section defines annotations in the EMMA namespace includingboth attributes and elements. The values are specified in terms ofthe data types defined by XML Schema Part 2: DatatypesSecondEdition [XMLSchema Datatypes].
emma:model
elementAnnotation | emma:model |
---|---|
Definition | Theemma:model either references or providesinline the data model for the instance data. |
Children | If aref attribute is not specified then thiselement contains the data model inline. |
Attributes |
|
Applies to | Theemma:model element MAY appear only as a childofemma:emma . |
The data model that may be used to express constraints on thestructure and content of instance data is specified as one of theannotations of the instance. Specifying the data model is OPTIONAL,in which case the data model can be said to be implicit. Typicallythe data model is pre-established by the application.
The data model is specified with theemma:model
annotation defined as an element in the EMMA namespace. If the datamodel for the contents of aemma:interpretation
,container elements, or application namespace element is to bespecified in EMMA, the attributeemma:model-ref
MUSTbe specified on theemma:interpretation
, containerelement, or application namespace element. Note that since multipleemma:model
elements might be specified under theemma:emma
it is possible to refer to multiple datamodels within a single EMMA document. For example, differentalternative interpretations under anemma:one-of
mighthave different data models. In this case, anemma:model-ref
attribute would appear on eachemma:interpretation
element in the N-best list withits value being theid
of theemma:model
element for that particular interpretation.
The data model is closely related to the interpretation data,and is typically specified as the annotation related to theemma:interpretation
oremma:one-of
elements.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:model ref="http://example.com/models/city.xml"/> <emma:interpretation emma:model-ref="model1"emma:medium="acoustic" emma:mode="voice"> <city> London </city> <country> UK </country> </emma:interpretation></emma:emma>
Theemma:model
annotation MAY reference any elementor attribute in the application instance data, as well as any EMMAcontainer element (emma:one-of
,emma:group
, oremma:sequence
).
The data model annotation MAY be used to either reference anexternal data model with theref
attribute or providea data model as in-line content. Either aref
attribute or in-line data model (but not both) MUST bespecified.
Note that unlike the use ofref
on e.g.emma:one-of
it is not possible in EMMA toprovide a partial specification of the data model inline and useemma:partial-content="true"
to indicate that the fulldata model is available from the URI inref
.
emma:derived-from
element andemma:derivation
elementAnnotation | emma:derived-from |
---|---|
Definition | An empty element which provides a reference to theinterpretation which the element it appears on was derivedfrom. |
Children | None |
Attributes |
|
Applies to | Theemma:derived-from element is legal only as achild ofemma:interpretation ,emma:one-of ,emma:group , oremma:sequence . |
Annotation | emma:derivation |
Definition | An element which contains interpretation and container elementsrepresenting earlier stages in the processing of the input. |
Children | One or moreemma:interpretation ,emma:one-of ,emma:sequence , oremma:group elements. |
Attributes | None |
Applies to | Theemma:derivation MAY appear only as a child oftheemma:emma ,emma:interpretation ,emma:one-of ,emma:group , andemma:sequence elements. |
Instances of interpretations are in general derived from otherinstances of interpretation in a process that goes from raw data toincreasingly refined representations of the input. The derivationannotation is used to link any two interpretations that are relatedby representing the source and the outcome of an interpretationprocess. For instance, a speech recognition process can return thefollowing result in the form of raw text:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretationemma:medium="acoustic" emma:mode="voice"> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation></emma:emma>
A first interpretation process will produce:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretationemma:medium="acoustic" emma:mode="voice"> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation></emma:emma>
A second interpretation process, aware of the current date, willbe able to produce a more refined instance, such as:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretationemma:medium="acoustic" emma:mode="voice"> <origin>Boston</origin> <destination>Denver</destination> <date>20030315</date> </emma:interpretation></emma:emma>
The interaction manager might need to have access to the threelevels of interpretation. Theemma:derived-from
annotation element can be used to establish a chain of derivationrelationships as in the following example:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:derivation> <emma:interpretation emma:medium="acoustic" emma:mode="voice"> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation> <emma:interpretation> <emma:derived-from resource="#raw" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> </emma:derivation> <emma:interpretation> <emma:derived-from resource="#better" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>20030315</date> </emma:interpretation></emma:emma>
Theemma:derivation
element MAY be used as acontainer for representations of the earlier stages in theinterpretation of the input. Theemma:derivation
element MAY appear only as a child of theemma:emma
,emma:interpretation
,emma:one-of
,emma:group
,emma:sequence
elements. Thatis, it can be a child ofemma:emma
, or any containerelement except literal or lattice. Ifemma:derivation
appears within a container it MUST apply to that specific element,or to a descendant of that element. The latest stage of processingMUST be a direct child ofemma:emma
.
The resource attribute onemma:derived-from
is aURI which can reference IDs in the current or other EMMA documents.Sinceemma:derivation
elements can appear in multipledifferent places, EMMA processors MUST use theemma:derived-from
element to identify earlier stagesof the processing of an input, rather than the document structure.The option to haveemma:derivation
in locations otherthan directly underemma:emma
is provided to make thedocument more transparent and improve human readability.
In the following example,emma:sequence
is used torepresent a sequence of two spoken inputs and each has its ownemma:derivation
element containing the previous stageof processing.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:sequence> <emma:interpretation> <emma:derived-from resource="#raw1" composite="false"/> <origin>Boston</origin> <emma:derivation> <emma:interpretation
emma:medium="acoustic" emma:mode="voice"> <emma:literal>flights from boston</emma:literal> </emma:interpretation> </emma:derivation> </emma:interpretation> <emma:interpretation> <emma:derived-from resource="#raw2" composite="false"/> <destination>Denver</destination> <emma:derivation> <emma:interpretation
emma:medium="acoustic" emma:mode="voice"> <emma:literal>to denver</emma:literal> </emma:interpretation> </emma:derivation> </emma:interpretation> </emma:sequence></emma:emma>
In addition to representing sequential derivations, the EMMAemma:derived-from
element can also be used to capturecomposite derivations. Composite derivations involve combination ofinputs from different modes.
In order to indicate whether anemma:derived-from
element describes a sequential derivation step or a compositederivation step, theemma:derived-from
element has anattributecomposite
which has a boolean value. Acompositeemma:derived-from
MUST be marked ascomposite="true"
while a sequentialemma:derived-from
element is marked ascomposite="false"
. If this attribute is not specifiedthe value isfalse
by default.
In the following composite derivation example the user said"destination" using the voice mode and circled Boston on a mapusing the ink mode:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:derivation> <emma:interpretation emma:start="1087995961500" emma:end="1087995962542" emma:process="http://example.com/myasr.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.6" emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:lang="en-US" emma:tokens="destination"> <rawinput>destination</rawinput> </emma:interpretation> <emma:interpretation emma:start="1087995961600" emma:end="1087995964000" emma:process="http://example.com/mygesturereco.xml" emma:source="http://example.com/pen/wacom123" emma:signal="http://example.com/signals/ink5.inkml" emma:confidence="0.5" emma:medium="tactile" emma:mode="ink" emma:function="dialog" emma:verbal="false"> <rawinput>Boston</rawinput> </emma:interpretation> </emma:derivation> <emma:interpretation emma:confidence="0.3"emma:start="1087995961500"emma:end="1087995964000" emma:medium="acoustic tactile" emma:mode="voice ink" emma:function="dialog" emma:verbal="true" emma:lang="en-US" emma:tokens="destination"> <emma:derived-from resource="#voice1" composite="true" <emma:derived-from resource="#ink1" composite="true" <destination>Boston</destination> </emma:interpretation></emma:emma>
In this example, annotations on the multimodal interpretationindicate the process used for the integration and there are twoemma:derived-from
elements, one pointing to the speechand one pointing to the pen gesture.
The only constraints the EMMA specification places on theannotations that appear on a composite input are that theemma:medium
attribute MUST contain the union of theemma:medium
attributes on the combining inputs,represented as a space delimited set ofnmtokens
asdefined inSection4.2.11, and that theemma:mode
attribute MUSTcontain the union of theemma:mode
attributes on thecombining inputs, represented as a space delimited set ofnmtokens
as defined inSection4.2.11. In the example above this meanings that theemma:medium
value is"acoustic tactile"
and theemma:mode
attribute is"voiceink"
. How all other annotations are handled is authordefined. In the following paragraph, informative examples on howspecific annotations might be handled are given.
With reference to the illustrative example above, this paragraphprovides informative guidance regarding the determination ofannotations (beyondemma:medium
andemma:mode
on a composite multimodal interpretation).Generally the timestamp on a combined input should contain theintervals indicated by the combining inputs. For the absolutetimestampsemma:start
andemma:end
thiscan be achieved by taking the earlier of theemma:start
values(emma:start="1087995961500"
in our example) and thelater of theemma:end
values(emma:end="1087995964000"
in the example). Thedetermination of relative timestamps for composite is more complex,informative guidance is given inSection4.2.10.4. Generally speaking theemma:confidence
value will be some numerical combination of the confidence scoresassigned to the combining inputs. In our example, it is the resultof multiplying the voice and ink confidence scores(0.3
). In other cases there may not be a confidencescore for one of the combining inputs and the author may choose tocopy the confidence score from the input which does have one.Generally, foremma:verbal
, if either of the inputshas the valuetrue
then the multimodal interpretationwill also beemma:verbal="true"
as in the example. Inother words the annotation for the composite input is the result ofan inclusive OR of the boolean values of the annotations on theinputs. If an annotation is only specified on one of the combininginputs then it may in some cases be assumed to apply to themultimodal interpretation of the composite input. In the example,emma:lang="en-US"
is only specified for the speechinput, and this annotation appears on the composite result also.Similarly in our example, only the voice hasemma:tokens
and the author has chosen to annotate thecombined input with the sameemma:tokens
value. Inthis example, theemma:function
is the same on bothcombining input and the author has chosen to use the sameannotation on the composite interpretation.
In annotating derivations of the processing of the input, EMMAprovides the flexibility of both course-grained or fine-grainedannotation of relations among interpretations. For example, whenrelating two N-best lists, withinemma:one-of
elementseither there can be a singleemma:derived-from
elementunderemma:one-of
referring to the ID of theemma:one-of
for the earlier processing stage:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:derivation> <emma:one-ofemma:medium="acoustic" emma:mode="voice"> <emma:interpretation> <res>from boston to denver on march eleven two thousand three</res> </emma:interpretation> <emma:interpretation> <res>from austin to denver on march eleven two thousand three</res> </emma:interpretation> </emma:one-of></emma:derivation><emma:one-of> <emma:derived-from resource="#nbest1" composite="false"/> <emma:interpretation> <origin>Boston</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> <emma:interpretation> <origin>Austin</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation></emma:one-of></emma:emma>
Or there can be a separateemma:derived-from
element on eachemma:interpretation
element referringto the specificemma:interpretation
element it wasderived from.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-of> <emma:interpretation> <emma:derived-from resource="#int1" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> <emma:interpretation> <emma:derived-from resource="#int2" composite="false"/> <origin>Austin</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> </emma:one-of> <emma:derivation> <emma:one-ofemma:medium="acoustic" emma:mode="voice"> <emma:interpretation> <res>from boston to denver on march eleven two thousand three</res> </emma:interpretation> <emma:interpretation> <res>from austin to denver on march eleven two thousand three</res> </emma:interpretation> </emma:one-of> </emma:derivation></emma:emma>
Section4.3 provides further examples of the use ofemma:derived-from
to represent sequential derivationsand addresses the issue of the scope of EMMA annotations acrossderivations of user input.
emma:grammar
elementAnnotation | emma:grammar |
---|---|
Definition | An element used indicate the grammar used in processing theinput.The grammar MUST either be specified inline ORreferenced using theref attribute. |
Children | In the case of inline specification of the grammar, thiselement contains an element with the specification of thegrammar. |
Attributes |
|
Applies to | Theemma:grammar is legal only as a child of theemma:emma element. |
The grammar that was used to derive the EMMA result MAY bespecified with theemma:grammar
annotation defined asan element in the EMMA namespace. Theemma:grammar-ref
attribute appears on the specific interpretation and references theappropriateemma:grammar
element.Theemma:grammar
element MUST either contain arepresentation of the grammar inline OR have aref
attribute which contains a URI referencing the grammar used inprocessing the input. The optional attributegrammar-type
onemma:grammar
contains aMIME type indicating the format of the specified grammar. Forexample an SRGS grammar in the XML format SHOULD be annotated asgrammar-type="application/srgs-xml"
. The namespace ofan inline grammar MUST be specified.
In the following example, there are three interpretations. Eachinterpretation is annotated withemma:grammar-ref
toindicate the grammar that resulted in that interpretation. The twoemma:grammar
elements indicate the URI for the grammarusing theref
attribute. Both grammars are SRGS XMLgrammars and so are annotated asgrammar-type="application/srgs-xml"
.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:grammar grammar-type="application/srgs-xml"ref="someURI"/> <emma:grammargrammar-type="application/srgs-xml" ref="anotherURI"/> <emma:one-ofemma:medium="acoustic" emma:mode="voice"> <emma:interpretation emma:grammar-ref="gram1"> <origin>Boston</origin> </emma:interpretation> <emma:interpretation emma:grammar-ref="gram1"> <origin>Austin</origin> </emma:interpretation> <emma:interpretation emma:grammar-ref="gram2"> <command>help</command> </emma:interpretation> </emma:one-of></emma:emma>
In the following example, there are two interpretations, eachfrom a different grammar, and the SRGS grammars used to derive theinterpretations are specified inline each as a child of anemma:grammar
element. The namespace of the inlinegrammars is indicated explicitly on each.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:grammar grammar-type="application/srgs-xml">
<grammar xmlns="http://www.w3.org/2001/06/grammar" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd" xml:lang="en" version="1.1" root="state" mode="voice" tag-format="semantics/1.0">
<rule scope="public"> <one-of> <item>California<tag>out="CA";</tag></item> <item>New Jersey<tag>out="NJ";</tag></item> <item>New York<tag>out="NY";</tag></item> </one-of> </rule>
</grammar> </emma:grammar> <emma:grammar grammar-type="application/srgs-xml">
<grammar xmlns="http://www.w3.org/2001/06/grammar" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/06/grammar http://www.w3.org/TR/speech-grammar/grammar.xsd" xml:lang="en" version="1.1" root="city" mode="voice" tag-format="semantics/1.0">
<rule scope="public"> <one-of> <item>Calgary<tag>out="YYC";</tag></item> <item>San Francisco<tag>out="SFO";</tag></item> <item>Boston<tag>out="BOS";</tag></item> </one-of> </rule>
</grammar> </emma:grammar> <emma:one-ofemma:medium="acoustic" emma:mode="voice"> <emma:interpretation emma:tokens="California" emma:grammar-ref="gram1"> <emma:literal>CA</emma:literal> </emma:interpretation> <emma:interpretation emma:tokens="Calgary" emma:grammar-ref="gram2"> <emma:literal>YYC</emma:literal> </emma:interpretation> </emma:one-of></emma:emma>
Non-XML grammar formats, such as the SRGS ABNF format, MUST becontained within<!-[CDATA[ ...]]>
. Care shouldbe taken in platforms generating EMMA to avoid conflicts betweenid
values in the EMMA markup and those in any inlinegrammars. Authors should be aware that there could be conflictsbetweenid
values used in different embedded inlinegrammars within an EMMA document.
Note that unlike the use ofref
on e.g.emma:one-of
it is not possible in EMMA toprovide a partial specification of the grammar inline and useemma:partial-content="true"
to indicate that the fullgrammar is available from the URI inref
.
emma:grammar-active
elementAnnotation | emma:grammar-active |
---|---|
Definition | An element used to indicate the grammars active during theprocessing of an input. |
Children | A list ofemma:active elements, one for eachgrammar currently active. |
Attributes |
|
Applies to | emma:interpretation ,emma:one-of ,emma:group ,emma:sequence |
Annotation | emma:active |
Definition | An element specifying a particular grammar active during theprocessing of an input. |
Children | None |
Attributes |
|
Applies to | emma:grammar-active |
The default when multipleemma:grammar
elements arespecified underemma:emma
is to assume that allgrammars are active for all of the interpretations specified in thetop level of the current EMMA document. In certain use cases, suchas documents containing results from different microphones ordifferent modalities, this may not be the case and the set ofgrammars active for a specific interpretation or set ofinterpretations should be annotated explicitly usingemma:grammar-active
. Each grammar which is active isindicated by an active element which must have anemma:grammar-ref
annotation pointing to the specificgrammar. For example, to make explicit the fact that both grammars,gram1
andgram2
are active for all of thethree N-best interpretations in the following example, anemma:grammar-active element appears as a child of theemma:one-of
.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:grammar grammar-type="application/srgs-xml"ref="someURI"/> <emma:grammargrammar-type="application/srgs-xml" ref="anotherURI"/> <emma:one-ofemma:medium="acoustic" emma:mode="voice"> <emma:grammar-active> <emma:active emma:grammar-ref="gram1"/> <emma:active emma:grammar-ref="gram2"/> </emma:grammar-active> <emma:interpretation emma:grammar-ref="gram1"> <origin>Boston</origin> </emma:interpretation> <emma:interpretation emma:grammar-ref="gram1"> <origin>Austin</origin> </emma:interpretation> <emma:interpretation emma:grammar-ref="gram2"> <command>help</command> </emma:interpretation> </emma:one-of></emma:emma>
The use of an element for each active grammar, allows for morecomplex use cases where specific metadata is associated with eachactive grammar. For example, a weighting of other parametersassociated with each active grammar could be specified within anemma:info
withinemma:active
.
emma:info
elementAnnotation | emma:info |
---|---|
Definition | Theemma:info element acts as a container forvendor and/or application specific metadata regarding a user'sinput. |
Children | One of more elements in the application namespaceproviding metadata about the input. |
Attributes |
|
Applies to | Theemma:info element is legal only as a child ofthe EMMA elementsemma:emma ,emma:interpretation ,emma:group ,emma:one-of ,emma:sequence ,emma:arc ,emma:node ,oremma:annotation . |
InSection4.2, a series of attributes are defined for representation ofmetadata about user inputs in a standardized form. EMMA alsoprovides an extensibility mechanism for annotation of user inputswith vendor or application specific metadata not covered by thestandard set of EMMA annotations. The elementemma:info
MUST be used as a container for theseannotations, UNLESS they are explicitly covered byemma:endpoint-info
. For example, if an input to adialog system needed to be annotated with the number that the calloriginated from, their state, some indication of the type ofcustomer, and the name of the service, these pieces of informationcould be represented withinemma:info
as in thefollowing example:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:info> <caller_id> <phone_number>2121234567</phone_number> <state>NY</state> </caller_id> <customer_type>residential</customer_type> <service_name>acme_travel_service</service_name> </emma:info> <emma:one-of
emma:start="1087995961542" emma:end="1087995963542"emma:medium="acoustic" emma:mode="voice"> <emma:interpretation emma:confidence="0.75"> <origin>Boston</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> <emma:interpretation emma:confidence="0.68"> <origin>Austin</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> </emma:one-of></emma:emma>
It is important to have an EMMA container element forapplication/vendor specific annotations since EMMA elements providea structure for representation of multiple possible interpretationsof the input. As a result it is cumbersome to stateapplication/vendor specific metadata as part of the applicationdata within eachemma:interpretation
. An element isused rather than an attribute so that internal structure can begiven to the annotations withinemma:info
.
In addition toemma:emma
,emma:info
MAY also appear as a child of other structural elements such asemma:interpretation
,emma:one-of
and soon. Whenemma:info
appears as a child of one of theseelements the application/vendor specific annotations containedwithinemma:info
are assumed to apply to all of theemma:interpretation
elements within the containingelement. The semantics of conflicting annotations inemma:info
, for example when different values are foundwithinemma:emma
andemma:interpretation
,are left to the developer of the vendor/application specificannotations.
There may be more than oneemma:info
element. Oneof the functions of this is to enable specification interpretationsto indicate whichemma:info
applies to them usingemma:info-ref
. Ifemma:info
has the optionalid
attribute then theemma:info-ref
attribute (Section4.2.19) can be used onemma:interpretation
andother container elements to indicate that a particular set ofapplication/vendor specific annotations apply to a particularinterpretation.
Theemma:info
element can therefore have eitherposition scope (applies to the element it appears in and theinterpretations within in), or index scope whereemma:info-ref
attributes are used to show whichinterpretations a particularemma:info
applies to. Inorder to distinguish emma:info elements that have positional vs.index scope the indexed attribute must be used. The attributeindexed=true
indicates that theemma:info
it appears on does not have positional scope and instead isreferenced usingemma:info-ref
. The attributeindexed=false
indicates than anemma:info
has positional scope. The default value ifindexed
isnot specified isfalse
. Theindexed
attribute is required if and only if there is anemma:info-ref
that refers to theid
oftheemma:info
.
Theref
attribute can also be used onemma:info
instead of placing the application/vendorspecific annotations inline. For example, assuming the exampleabove was available athttp://example.com/examples/123/emma.xml
, the EMMAdocument delivered to an EMMA consumer could be:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:info ref="http://example.com/examples/123/emma.xml#info_details"/> <emma:one-of
emma:start="1087995961542" emma:end="1087995963542" emma:medium="acoustic" emma:mode="voice"> <emma:interpretation emma:confidence="0.75"> <origin>Boston</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> <emma:interpretation emma:confidence="0.68"> <origin>Austin</origin> <destination>Denver</destination> <date>03112003</date> </emma:interpretation> </emma:one-of></emma:emma>
Aref
onemma:info
canalso be used to point to an external document, not necessarily anEMMA document, containing additional annotations on theinterpretation. For example, it could be used to point to an XMLdocument providing a list of the specifications of the inputdevice.
emma:endpoint-info
element andemma:endpoint
elementAnnotation | emma:endpoint-info |
---|---|
Definition | Theemma:endpoint-info element acts as a containerfor all application specific annotation regarding the communicationenvironment. |
Children | One or moreemma:endpoint elements. |
Attributes |
|
Applies to | Theemma:endpoint-info elements is legal only as achild ofemma:emma . |
Annotation | emma:endpoint |
Definition | The element acts as a container for application specificendpoint information. |
Children | Elements in the application namespace providing metadata aboutthe input. |
Attributes |
|
Applies to | emma:endpoint-info |
In order to conduct multimodal interaction, there is a need inEMMA to specify the properties of the endpoint that receives theinput which leads to the EMMA annotation. This allows subsequentcomponents to utilize the endpoint properties as well as theannotated inputs to conduct meaningful multimodal interaction. EMMAelementemma:endpoint
can be used for this purpose. Itcan specify the endpoint properties based on a set of commonendpoint property attributes in EMMA, such asemma:endpoint-address
,emma:port-num
,emma:port-type
, etc. (Section4.2.14). Moreover, it provides an extensible annotationstructure that allows the inclusion of application and vendorspecific endpoint properties.
Note that the usage of the term "endpoint" in this context isdifferent from the way that the term is used in speech processing,where it refers to the end of a speech input. As used here,"endpoint" refers to a network location which is the source orrecipient of an EMMA document.
In multimodal interaction, multiple devices can be used and eachdevice can open multiple communication endpoints at the same time.These endpoints are used to transmit and receive data, such as rawinput, EMMA documents, etc. The EMMA elementemma:endpoint
provides a generic representation ofendpoint information which is relevant to multimodal interaction.It allows the annotation to be interoperable, and it eliminates theneed for EMMA processors to create their own specializedannotations for existing protocols, potential protocols or yetundefined private protocols that they may use.
Moreover,emma:endpoint-info
provides a containerto hold all annotations regarding the endpoint information,includingemma:endpoint
and other application andvendor specific annotations that are related to the communication,allowing the same communication environment to be referenced andused in multiple interpretations.
Note that EMMA provides two locations (i.e.emma:info
andemma:endpoint-info
) forspecifying vendor/application specific annotations. If theannotation is specifically related to the description of theendpoint, then the vendor/application specific annotation SHOULD beplaced withinemma:endpoint-info
, otherwise it SHOULDbe placed withinemma:info
.
The following example illustrates the annotation of endpointreference properties in EMMA.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example" xmlns:ex="http://www.example.com/emma/port"> <emma:endpoint-info> <emma:endpoint emma:endpoint-role="sink" emma:endpoint-address="135.61.71.103" emma:port-num="50204" emma:port-type="rtp" emma:endpoint-pair-ref="endpoint2" emma:media-type="audio/dsr-202212; rate:8000; maxptime:40" emma:service-name="travel" emma:mode="voice"> <ex:app-protocol>SIP</ex:app-protocol> </emma:endpoint> <emma:endpoint emma:endpoint-role="source" emma:endpoint-address="136.62.72.104" emma:port-num="50204" emma:port-type="rtp" emma:endpoint-pair-ref="endpoint1" emma:media-type="audio/dsr-202212; rate:8000; maxptime:40" emma:service-name="travel" emma:mode="voice"> <ex:app-protocol>SIP</ex:app-protocol> </emma:endpoint> </emma:endpoint-info> <emma:interpretation emma:start="1087995961542" emma:end="1087995963542" emma:endpoint-info-ref="audio-channel-1"emma:medium="acoustic" emma:mode="voice"> <destination>Chicago</destination> </emma:interpretation></emma:emma>
Theex:app-protocol
is provided by the applicationor the vendor specification. It specifies that the applicationlayer protocol used to establish the speech transmission from the"source" port to the "sink" port is Session Initiation Protocol(SIP). This is specific to SIP based VoIP communication, in whichthe actual media transmission and the call signaling that controlsthe communication sessions, are separated and typically based ondifferent protocols. In the above example, the Real-timeTransmission Protocol (RTP) is used in the media transmissionbetween the source port and the sink port.
emma:process-model
elementAnnotation | emma:process-model |
---|---|
Definition | An element used indicate the model used in processing theinput. The model must be referenced using theref attribute which is URI valued. |
Children | None. |
Attributes |
|
Applies to | Theemma:process-model is legal only as a child oftheemma:emma element. |
The model that was used to derive the EMMA result MAY bespecified with theemma:process-model
annotationdefined as an element in the EMMA namespace. Theemma:process-model-ref
attribute appears on thespecific interpretation and references the appropriateemma:process-model
element. Theemma:process-model
element MUST have aref
attribute which contains a URI referencing themodel used in processing the input. Unlikeemma:grammar
,emma:process-model
does notallow for inline specification of a model. For eachemma:process-model
element there MUST be anemma:process-model-ref
in the document those value istheid
of thatemma:process-model
. Theemma:process-model
element cannot have positionalscope.
Theemma:process-model
element MUST have anattributetype
containing a stringindicating the type of model referenced. The value of type is drawnfrom an open set including{svm,crf,neural_network,hmm...}
.
Examples of potential uses ofemma:process-model
include referencing the model used for handwriting recognition or atext classification model used for natural language understanding.Theemma:process-model
annotation SHOULD be used forinput processing models that are not grammars. Grammars SHOULD bereferenced or specified inline usingemma:grammar
.Some input processing modules may utilize both a recognition modeland a grammar. For example, for handwriting recognition ofelectronic ink a neural network might be used for characterrecognition while a language model or grammar is used to constrainthe word or character sequences recognized. In this case, theneural network SHOULD be referenced usingemma:process-model
and the grammar or language modelusingemma:grammar
.
In the following example, there are two interpretations. TheEMMA document in this example is produced by a computer visionsystem doing object recognition. The first interpretation isgenerated by a process model for vehicle recognition and secondcompeting interpretation is generated by a process model for personrecognition.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:process-model type="neural_network" ref="http://example.com/vision/vehicle"/> <emma:process-model type="neural_network" ref="http://example.com/vision/people"/> <emma:one-of emma:start="1087995961542" emma:end="1087995961542" emma:medium="visual" emma:mode="image" emma:process="http://example.com/mycompvision1.xml">> <emma:interpretation emma:confidence="0.9" emma:process-model-ref="pm1"> <object>aircraft</object> </emma:interpretation> <emma:interpretation emma:confidence="0.1" emma:process-model-ref="pm2"> <object>person</object> </emma:interpretation> </emma:one-of></emma:emma>
emma:parameters
andemma:parameter
elementsAnnotation | emma:parameters |
---|---|
Definition | An element used indicate a set of parameters used to configurea processor used in producing an EMMA result. |
Children | Any number ofemma:parameter elements |
Attributes |
|
Applies to | Theemma:parameters MAY appear only as a child oftheemma:emma ,emma:interpretation ,emma:one-of ,emma:group , andemma:sequence elements. |
Annotation | emma:parameter |
Definition | An element used indicate a specific parameter in theconfiguration of a processor used in producing an EMMAresult. |
Children | None |
Attributes |
|
Applies to | Theemma:parameter is legal only as a child of theemma:parameters element. |
A set of parameters that were used to configure the EMMAprocessor that produces an EMMA result MAY be specified with theemma:parameters
annotation defined as an element inthe EMMA namespace. Theemma:parameter-ref
attribute(Section4.2.21)appears on the specificemma:interpretation
or othercontainer element and references the appropriateemma:parameters
element. For example, typicalparameters for speech recognition such as confidence thresholds,speed vs. accuracy, timeouts, settings for endpointing etc can beincluded inemma:parameters
.
For eachemma:parameters
element there MUST be anemma:parameter-ref
in the document those value is theid
of thatemma:parameters
. Theemma:parameters
element cannot have positionalscope.
The optional attributeapi-ref
onemma:parameter
andemma:parameters
,specifies the specific API that the name and value of a parameteris drawn from or names and values of the set of parameters aredrawn from. It's value is a string from an open set including:{vxml2.1, vxml2.0, MRCPv2, MRCPv1, html+speech,OpenCV....}. A parametersname
andvalue
are from the API specified inapi-ref
on theemma:parameter
element ifpresent. Otherwise, they are from the API specified inapi-ref
, if present, on the surroundingemma:parameters
element. If theapi-ref
is not defined on eitheremma:parameter
oremma:parameters
the API that the name(s) and value(s)are drawn from is undefined.
In the following example, the interpretation is annotated withemma:parameter-ref
to indicate the set of processingparameters that resulted in that interpretation. These arecontained within anemma:parameters
underemma:emma
. The API for the first two parameters isinherited fromemma:parameters
and is"vxml2.1"
. The API for the third parameter is vendorspecific and specified directly inapi-ref
on thatemma:parameter
element.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:parameters api-ref="vxml2.1">
<emma:parameter name="confidencelevel" value=".9"/>
<emma:parameter name="completetimeout" value=".3s"/> <emma:parameter name="word_confusion_network_confidence" value="YES" api-ref="x-acme-recognizer"/>
</emma:parameters>
<emma:interpretation emma:parameter-ref="parameters1" emma:medium="acoustic" emma:mode="voice" emma:process="http://example.com/asr>
<origin>Boston</origin>
</emma:interpretation>
</emma:emma>
Note that in an EMMA document describing a multimodal input or aderivation with multiple steps there may be multiple differentemma:parameters
elements specifying the parametersused for each specific mode or processing stage. The relationshipbetween aemma:parameters
element and the containerelement it applies to is captured by theemma:parameter-ref
attribute.
Instead of specifying parameters inline theref
attribute can be used to provide a URI referenceto an external document containing the parameters. This could beeither a pointer to anemma:parameters
element withinan EMMA document, or it can be a reference to a non-EMMA documentcontaining a specification of the parameters. In the followingexample, theemma:parameters
element contains areference to an separate parameters document.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:parameters api-ref="vxml2.1" ref="http://example.com/mobile/asr/params.xml"
</emma:parameters>
<emma:interpretation emma:parameter-ref="parameters1" emma:medium="acoustic" emma:mode="voice" emma:process="http://example.com/asr>
<origin>Boston</origin>
</emma:interpretation>
</emma:emma>
emma:annotation
elementAnnotation | emma:annotation |
---|---|
Definition | Theemma:annotation element acts as acontainer for annotations of user inputs made by humanlabellers |
Children | One or more elements providing annotations of the input.May also contain a singleemma:info element. |
Attributes |
|
Applies to | Theemma:annotation element is legal only asa child of the EMMA elementsemma:emma ,emma:interpretation ,emma:group ,emma:one-of ,emma:sequence ,emma:arc , oremma:node . |
In many spoken and multimodal applications, at some time afteruser interactions have taken place, human labellers are used toprovide annotation of the input. For example, for speech input themost common annotation is to transcribe the actual words spoken bythe user by listening to the audio. The correct semanticinterpretation of the input may also be annotated. Labellers mayalso annotate other aspects of the input such as the emotionalstate of the user.
To provide support for augmenting logged EMMA documents withhuman annotations, the EMMA markup provides theemma:annotation
element. Multiple instances of thiselement can appear as a children of the various EMMA containers. Inexamples withemma:one-of
and multipleemma:interpretation
elements,emma:annotation
will generally appear as a child ofemma:one-of
as it is an annotation of the signalrather than of the specific interpretation hypotheses encoded inthe individual interpretations. Theemma:annotation
element can also be used to annotated arcs and states in latticesby including it inemma:arc
andemma:node
.
In addition toid
, theemma:annotation
element provides a series of optional attributes that MAY be usedto provide metadata regarding the annotation. Theannotator
attribute contains a string indicating thename or other identifier of the annotator. Thetype
attribute indicates the kind of annotation and has an open set ofvalues{transcription, semantics, emotion ...}
. Thetime
attribute onemma:annotation
doesnot have any relation to the time of the input itself, rather itindicates the date and time that the annotation was made. Theemma:confidence
attribute is a value between 0 and 1indicating the annotators confidence in their annotation. Thereference
attribute is a boolean which indicateswhether the annotation is appears on is the reference annotationfor the interpretation as opposed to some other annotation of theinput. For example, if the interpretation in the EMMA document is aspeech recognition result, annotation of the reference stringSHOULD havereference="true"
, while an annotation ofthe emotional state of the user should be annotated asreference="false"
Further metadata regarding theannotation can be captured by usingemma:info
withinemma:annotation
.
In addition to specifying annotations inline theemma:annotation
element can be used to refer to anexternal document containing the annotation content.
In the following example, the EMMA document contains an N-bestlist with two recognition hypotheses and their semanticrepresentations. Underemma:one-of
there are threedifferent annotations all made by different annotators on differentdays and times. The first is the transcription, this indicates thatin fact neither of the N-best results was correct and actualutterance spoken was "flights from austin to denver tomorrow". Thesecond annotation (label2
) contains the annotatedsemantic interpretation of the reference string. The thirdannotation contains an additional piece of metadata captured by ahuman labeller, specifically it captures the fact that based on theaudio, the user's emotional state was angry. Here as anillustration we utilizeEmotionML markup.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-of emma:start="1087995961542" emma:end="1087995963542"emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:signal="http://example.com/signals/audio457.wav"> <emma:interpretation emma:confidence="0.75" emma:tokens="flights from boston to denver tomorrow"> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> <emma:interpretation emma:confidence="0.68" emma:tokens="flights from austin to denver today"> <origin>Austin</origin> <destination>Denver</destination> <date>today</date> </emma:interpretation><emma:annotation annotator="joe_bloggs" time="2011-10-26T21:32:52" type="transcription" emma:confidence="0.9" reference="false"> <emma:literal>flights from austin to denver tomorrow</emma:literal> </emma:annotation> <emma:annotation annotator="mary_smith" time="2011-10-27T12:00:21" type="semantics" emma:confidence="1.0" reference="true"> <origin>Austin</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:annotation> <emma:annotation annotator="tim_black" time="2011-11-10T09:00:21" type="emotion" emma:confidence="1.0" reference="false"> <emotionml xmlns="http://www.w3.org/2009/10/emotionml"> <emotion> <category set="everyday" name="angry"/>
<modality medium="acoustic" mode="voice"/> </emotion> </emotionml> </emma:annotation> </emma:one-of></emma:emma>
In addition to this more powerful mechanism for adding humanannotation to a document, EMMA also provides a shorthandemma:annotated-tokens
attribute for the common usecase of adding reference transcriptions to an EMMA document (Section4.2.22) .
Note that 'annotation' as used in theemma:annotation
element and theemma:annotated-tokens
attribute refers only toannotations made in a post process by human labellers to indicatewhat the correct processing (reference) of an input should havebeen or to annotate other aspects of the input. This differs fromthe general sense of annotation as used more broadly in thespecification as in the title "Extensible MultiModal Annotation",which refers in general to metadata provided about an input eitherby an EMMA processor or by a human labeller. The many annotationelements and attributes in EMMA are used to indicate metadatacaptured regarding an input. Theemma:annotation
element andemma:annotated-tokens
attribute arespecifically for the addition of information provided by humanlabellers.
Annotations such theEmotionML in the example above can also be stored in separate filesand referenced on anemma:annotation
element usingref
. Likeemma:parameters
, a partialspecification of the annotation can be provided inline andemma:partial-content="true"
provides an indicationthat the full annotation can be accessed atref
.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-of emma:start="1087995961542" emma:end="1087995963542" emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:signal="http://example.com/signals/audio457.wav" emma:confidence="0.75"> <emma:interpretation emma:confidence="0.75" emma:tokens="flights from boston to denver tomorrow"> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> <emma:annotation annotator="tim_black" time="2011-11-10T09:00:21" type="emotion" confidence="1.0" reference="false" ref="http://example.com/2011/11/10/emotion123.xml"> </emma:annotation> </emma:one-of></emma:emma>
emma:location
elementAnnotation | emma:location |
---|---|
Definition | Theemma:location element acts as acontainer for information about the location of a user input, moreprecisely, information about the location of a capture device suchas a mobile device. |
Children | none |
Attributes |
|
Applies to | Theemma:location element is legal only as achild of the EMMA elementsemma:emma ,emma:interpretation ,emma:group ,emma:one-of ,emma:sequence . |
Many mobile devices and sensors are equiped withgeolocation capabilities and information about where a unimodal ormultimodal event occurred can be very useful both forinterpretation and logging. Annotating interpretations withlocation information in EMMA is achieved with theemma:location
element. Theemma:location
element indicates the location of the capture device. In many casesthe device location and the user location will be identical, as inthe case where the user is carrying a mobile device. In other usecases (e.g. cameras capturing distant motion, far field microphonearrays) the user may be distant from from the device location.Capturing the location of the user or other source of signal isbeyond the scope of theemma:location
annotation. Notethatemma:location
is not intended as a generalsemantic representation for location information, e.g. a gesturemade a location on a map or a spoken location, these rather arepart of the interpretation and should be contained withinemma:interpretation
rather than theemma:location
annotation element. The locationinformation inemma:location
represents a point inspace. Since a device or sensor may be moving during the capture ofan input, the location may not be same at the beginning and end ofan input. For this reason, theemma:location
information is defined to be relative to the beginning of thecapture. Note though that the bearing of the sensor can beannotated using theemma:heading
andemma:speed
attributes onemma:location
.Theemma:location
element represents the location ofsingle capture device. Uses cases where multiple input devices orsensors are involved in the capture of the input can be representedas composite inputs with anemma:location
elementannotation on each of the interpretations that are composed.Multimodal Interaction Working Group invites comments on use casesthat may require a finer-grained representation of locationmetadata.
Theemma:location
attributes are basedon the W3C Geolocation API[Geolocation]specification, with the addition of attributes for a description ofthe location and address information. The formats of the attributesfrom the Geolocation API are as defined in that specification.Specifically, they are:
The geographic coordinate reference system used bythe attributes is the World Geodetic System (2d)[WGS84].No other reference system is supported.
Theemma:latitude
andemma:longitude
attributes are geographic coordinatesof the capture device at the beginning of the capture. They MUST bespecified in decimal degrees.
Theemma:altitude
attribute denotes theheight of the position at the beginning of the capture. It MUST bespecified in meters above the[WGS84]ellipsoid, or as provided by the device's geolocationimplementation. If the implementation cannot provide altitudeinformation, the value of this attribute MUST be the emptystring.
Theemma:accuracy
attribute denotes theaccuracy of the latitude and longitude coordinates. It MUST bespecified in meters. The value of theemma:accuracy
attribute MUST be a non-negative real number.
Theemma:altitudeAccuracy
attribute isspecified in meters. If the implementation cannot provide altitudeinformation, the value of this attribute MUST be the empty string.Otherwise, the value of theemma:altitudeAccuracy
attribute MUST be a non-negative real number.
Theemma:accuracy
andemma:altitudeAccuracy
values in a EMMA document SHOULDcorrespond to a 95% confidence level.
Theemma:heading
attribute denotes thedirection of travel of the capture device at the beginning of thecapture, and is specified in degrees, where 0° ≤ heading < 360°,counting clockwise relative to the true north. If theimplementation cannot provide heading information, the value ofthis attribute MUST be the empty string. If the capture device isstationary (i.e. the value of the speed attribute is 0), then thevalue of theemma:heading
attribute MUST be the emptystring.
Theemma:speed
attribute denotes themagnitude of the horizontal component of the capture device'svelocity at the beginning of the capture, and MUST be specified inmeters per second. If the implementation cannot provide speedinformation, the value of this attribute MUST be the empty string.Otherwise, the value of theemma:speed
attribute MUSTbe a non-negative real number.
Theemma:description
attribute is anarbitrary string describing the location of the capture device atthe beginning of the capture.
Theemma:address
attribute is anarbitrary string describing the address of the capture device atthe beginning of the capture.
The internal formats of theemma:description
and theemma:address
attributes are not defined in this specification.
The following example shows the location informationfor an input spoken at the W3C MIT office.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:location latitude="42.361860" longitude="-71.091840" altitude="6.706" accuracy="20.5" altitudeAccuracy="1.6" heading="" speed="" description="W3C MIT office" address="32 Vassar Street, Cambridge, MA 02139 USA"/> </emma:location> <emma:interpretation emma:medium="acoustic" emma:mode="voice" emma:tokens="flights from boston to denver"> <origin>Boston</origin> <destination>Denver</destination> </emma:interpretation> </emma:emma>
emma:tokens
attributeAnnotation | emma:tokens |
---|---|
Definition | An attribute of typexsd:string holding a sequenceof input tokens. |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence , andapplication instance data. |
Theemma:tokens
annotation holds a list of inputtokens. In the following description, the termtokens isused in the computational and syntactic sense ofunits ofinput, and not in the sense ofXML tokens. The valueheld inemma:tokens
is the list of the tokens of inputas produced by the processor which generated the EMMA document;there is no language associated with this value.
In the case where a grammar is used to constrain input, thevalue will correspond to tokens as defined by the grammar. So foran EMMA document produced by input to a SRGS grammar [SRGS],the value ofemma:tokens
will be the list of wordsand/or phrases that are defined as tokens in SRGS (seeSection 2.1of [SRGS]).Items in theemma:tokens
list are delimited by whitespace and/or quotation marks for phrases containing white space.For example:
emma:tokens="arriving at 'Liverpool Street'"
where the three tokens of input arearriving,atandLiverpool Street.
Theemma:tokens
annotation MAY be applied not justto the lexical words and phrases of language but to any level ofinput processing. Other examples of tokenization include phonemes,ink strokes, gestures and any other discrete units of input at anylevel.
Examples:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:tokens="From Cambridge to London tomorrow"emma:medium="acoustic" emma:mode="voice"> <origin emma:tokens="From Cambridge">Cambridge</origin> <destination emma:tokens="to London">London</destination> <date emma:tokens="tomorrow">20030315</date> </emma:interpretation></emma:emma>
emma:process
attributeAnnotation | emma:process |
---|---|
Definition | An attribute of typexsd:anyURI referencing theprocess used to generate the interpretation. |
Applies to | emma:interpretation ,emma:one-of ,emma:group ,emma:sequence |
A reference to the information concerning the processing thatwas used for generating an interpretation MAY be made using theemma:process
attribute. For example:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:derivation> <emma:interpretationemma:medium="acoustic" emma:mode="voice"> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation> <emma:interpretation emma:process="http://example.com/mysemproc1.xml"> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> <emma:derived-from resource="#raw"/> </emma:interpretation> </emma:derivation> <emma:interpretation emma:process="http://example.com/mysemproc2.xml"> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> <emma:derived-from resource="#better"/> </emma:interpretation></emma:emma>
The process description document, referenced by theemma:process
annotation MAY include information on theprocess itself, such as grammar, type of parser, etc. EMMA is notnormative about the format of the process description document.
Note that while theemma:process
attribute mayrefer to a document that describes the process, the URI syntaxitself can be used to briefly describe the process within the EMMAdocument without actually referring to an external document. Forexample, the results of a natural language understanding componentcould be annotated as follows:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:medium="acoustic" emma:mode="voice" emma:tokens="flights from boston to denver tomorrow please" emma:process="http://nlu/classifier=svm&model=travel&output=xml"> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation></emma:emma>
In this case theemma:process
attribute indicatesthat the process is natural language understanding(nlu
) that the classifier used is a support vectormachine (svm
), that the specific model is the'travel
' model and the required output was'xml
'. Note that none of the specific values usedwithin the URI here are standardized. This simply illustrates how aURI can be used to provide a detailed process description.
emma:no-input
attributeAnnotation | emma:no-input |
---|---|
Definition | Attribute holdingxsd:boolean value that is trueif there was no input. |
Applies to | emma:interpretation |
The case of lack of input MUST be annotated as follows:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:no-input="true"
emma:medium="acoustic" emma:mode="voice"/></emma:emma>
If theemma:interpretation
is annotated withemma:no-input="true"
then theemma:interpretation
MUST be empty.
emma:uninterpreted
attributeAnnotation | emma:uninterpreted |
---|---|
Definition | Attribute holdingxsd:boolean value that is trueifno interpretation was produced in response to theinput |
Applies to | emma:interpretation |
Anemma:interpretation
element representing inputfor which no interpretation was produced MUST beannotated withemma:uninterpreted="true"
. Forexample:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:uninterpreted="true"emma:medium="acoustic" emma:mode="voice"/></emma:emma>
The notation for uninterpreted input MAY refer to any possiblestage of interpretation processing, including raw transcriptions.For instance, no interpretation would be produced for stagesperforming pure signal capture such as audio recordings. Likewise,if a spoken input was recognized but cannot be parsed by a languageunderstanding component, it can be tagged asemma:uninterpreted
as in the following example:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:process="http://example.com/mynlu.xml" emma:uninterpreted="true" emma:tokens="From Cambridge to London tomorrow"emma:medium="acoustic" emma:mode="voice"/></emma:emma>
Theemma:interpretation
MUST be emptyemma:interpretation
elementis annotated withemma:uninterpreted="true"
.
emma:lang
attributeAnnotation | emma:lang |
---|---|
Definition | An attribute of typexsd:language indicating thelanguage for the input. |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence , andapplication instance data. |
Theemma:lang
annotation is used to indicate thehuman language for the input that it annotates. The values of theemma:lang
attribute are language identifiers asdefined byIETF Best Current Practice 47 [BCP47].For example,emma:lang="fr"
denotes French, andemma:lang="en-US"
denotes US English.emma:lang
MAY be applied to anyemma:interpretation
element. Its annotative scopefollows the annotative scope of these elements. Unlike thexml:lang
attribute in XML,emma:lang
doesnot specify the language used by element contents or attributevalues.
The following example shows the use ofemma:lang
for annotating an input interpretation.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:lang="fr"emma:medium="acoustic" emma:mode="voice"> <answer>arretez</answer> </emma:interpretation></emma:emma>
Many kinds of input including some inputs made through pen,computer vision, and other kinds of sensors are inherentlynon-linguistic. Examples include drawing areas, arrows etc. using apen and music input for tune recognition. If these non-linguisticinputs are annotated withemma:lang
then they MUST beannotated asemma:lang="zxx"
. For example, pen inputwhere a user circles an area on map display could be represented asfollows whereemma:lang="zxx"
indicates that the inkinput is not in any human language.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:medium="tactile" emma:mode="ink" emma:lang="zxx"> <location> <type>area</type> <points>42.1345 -37.128 42.1346 -37.120 ... </points> </location> </emma:interpretation></emma:emma>
If inputs for which there is no information about whether thesource input is in a particular human language, and if so whichlanguage, are annotated withemma:lang,
then they MUSTbe annotated asemma:lang=""
. Furthermore, in caseswhere there is not explicitemma:lang
annotation, andnone is inherited from a higher element in the document, thedefault value foremma:lang
is""
meaningthat there is no information about whether the source input is in alanguage and if so which language.
Thexml:lang
andemma:lang
attributesserve uniquely different and equally important purposes. The roleof thexml:lang
attribute in XML 1.0 is to indicatethe language used for character data content in an XML element ordocument. In contrast, theemma:lang
attribute is usedto indicate the language employed by a user when entering an input.Critically,emma:lang
annotates the language of thesignal originating from the user rather than the specific tokensused at a particular stage of processing. This is most clearlyillustrated through consideration of an example involving multiplestages of processing of a user input. Consider the followingscenario: EMMA is being used to represent three stages in theprocessing of a spoken input to an system for ordering products.The user input is in Italian, after speech recognition, the userinput is first translated into English, then a natural languageunderstanding system converts the English translation into aproduct ID (which is not in any particular language). Since theinput signal is a user speaking Italian, theemma:lang
will beemma:lang="it"
on all of these three stages ofprocessing. Thexml:lang
attribute, in contrast, willinitially be"it"
, after translation thexml:lang
will be"en-US"
, and afterlanguage understanding it will be"zxx"
since theproduct ID is non-linguistic content. The following are examples ofEMMA documents corresponding to these three processing stages,abbreviated to show the critical attributes for discussion here.Note that<transcription>
,<translation>
, and<understanding>
are application namespaceattributes, not part of the EMMA markup.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic"> <transcription xml:lang="it">condizionatore</transcription> </emma:interpretation></emma:emma>
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic"> <translation xml:lang="en-US">air conditioner</translation> </emma:interpretation></emma:emma>
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:lang="it" emma:mode="voice" emma:medium="acoustic"> <understanding xml:lang="zxx">id1456</understanding> </emma:interpretation></emma:emma>
In orderto handle inputs involving multiplelanguages, such as through code switching, theemma:lang
tag MAY contain several language identifiersseparated by spaces.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:tokens="please stop arretez s'il vous plait" emma:lang="en fr"emma:medium="acoustic" emma:mode="voice"> <command> CANCEL </command> </emma:interpretation></emma:emma>
emma:signal
andemma:signal-size
attributesAnnotation | emma:signal |
---|---|
Definition | An attribute of typexsd:anyURI referencing theinput signal. |
Applies to | emma:interpretation ,emma:one-of ,emma:group ,emma:sequence ,and application instance data. |
Annotation | emma:signal-size |
Definition | An attributeof typexsd:nonNegativeInteger specifying the size in eight bit octets of the referencedsource. |
Applies to | emma:interpretation ,emma:one-of ,emma:group ,emma:sequence ,and application instance data. |
A URI reference to the signal that originated the inputrecognition process MAY be represented in EMMA using theemma:signal
annotation.For example, in the caseof speech recognition, theemma:signal
attribute isthe annotation used to reference the audio that was recognized. TheMIME type of the audio can be indicated usingemma:media-type
.
Here is an example where the reference to a speech signal isrepresented using theemma:signal
annotation on theemma:interpretation
element:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:signal="http://example.com/signals/sg23.bin"emma:medium="acoustic" emma:mode="voice"> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation></emma:emma>
Theemma:signal-size
annotation can be used todeclare the exact size of the associated signal in 8-bit octets. Anexample of the use of an EMMA document to represent a recording,withemma:signal-size
indicating the size is asfollows:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:medium="acoustic" emma:mode="voice" emma:function="recording" emma:uninterpreted="true" emma:signal="http://example.com/signals/recording.mpg" emma:signal-size="82102" emma:duration="10000"> </emma:interpretation></emma:emma>
emma:media-type
attributeAnnotation | emma:media-type |
---|---|
Definition | An attribute of typexsd:string holding the MIMEtype associated with the signal's data format. |
Applies to | emma:interpretation ,emma:one-of ,emma:group ,emma:sequence ,emma:endpoint ,and application instancedata. |
The data format of the signal that originated the input MAY berepresented in EMMA using theemma:media-type
annotation. An initial set of MIME media types is defined by [RFC2046].
Here is an example where the media type for the ETSI ES 202 212audio codec for Distributed Speech Recognition (DSR) is applied totheemma:interpretation
element. The example alsospecifies an optional sampling rate of 8 kHz and maxptime of 40milliseconds.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:signal="http://example.com/signals/signal.dsr" emma:media-type="audio/dsr-es202212; rate:8000; maxptime:40"emma:medium="acoustic" emma:mode="voice"> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation></emma:emma>
emma:confidence
attributeAnnotation | emma:confidence |
---|---|
Definition | An attribute of typexsd:decimal in range 0.0 to1.0, indicating the processor's confidence in the result. |
Applies to | emma:interpretation ,emma:one-of ,emma:group ,emma:sequence ,emma:annotation , and application instance data. |
The confidence score in EMMA is used to indicateemma:confidence
. The confidence score MUST be a numberin the range from 0.0 to 1.0 inclusive. A value of 0.0 indicatesminimum confidence, and a value of 1.0 indicates maximumconfidence. Note thatemma:confidence
represents notonly the confidence of the speech recognizer, but
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-ofemma:medium="acoustic" emma:mode="voice"> <emma:interpretation emma:confidence="0.6"> <location>Boston</location> </emma:interpretation> <emma:interpretation emma:confidence="0.4"> <location> Austin </location> </emma:interpretation> </emma:one-of></emma:emma>
In addition to its use as an attribute on the EMMAinterpretation and container elements, theemma:confidence
attribute MAY also be used to assignconfidences to elements in instance data in the applicationnamespace. This can be seen in the following example, where the<destination>
and<origin>
elements have confidences.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:confidence="0.6"emma:medium="acoustic" emma:mode="voice"> <destination emma:confidence="0.8"> Boston</destination> <origin emma:confidence="0.6"> Austin </origin> </emma:interpretation></emma:emma>
Although in general instance data can be represented in XMLusing a combination of elements and attributes in the applicationnamespace, EMMA does not provide a standard way to annotateprocessors' confidences in attributes. Consequently, instance datathat is expected to be assigned confidences SHOULD be representedusing elements, as in the above example.
emma:source
attributeAnnotation | emma:source |
---|---|
Definition | An attribute of typexsd:anyURI referencing thesource of input. |
Applies to | emma:interpretation ,emma:one-of ,emma:group ,emma:sequence , andapplication instance data. |
The source of an interpreted input MAY be represented in EMMA asa URI resource using theemma:source
annotation. Hereis an example that shows different input sources for differentinput interpretations.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example" xmlns:myapp="http://www.example.com/myapp"> <emma:one-ofemma:medium="acoustic" emma:mode="voice"> <emma:interpretation emma:source="http://example.com/microphone/NC-61"> <myapp:destination>Boston</myapp:destination> </emma:interpretation> <emma:interpretation emma:source="http://example.com/microphone/NC-4024"> <myapp:destination>Austin</myapp:destination> </emma:interpretation> </emma:one-of></emma:emma>
The start and end times for input MAY be indicated using eitherabsolute timestamps or relative timestamps. Both are inmilliseconds for ease in processing timestamps. Note that theECMAScript Date object'sgetTime()
function is aconvenient way to determine the absolute time.
emma:start
,emma:end
attributesAnnotation | emma:start, emma:end |
---|---|
Definition | Attributesof typexsd:nonNegativeInteger indicating the absolutestarting and ending times of an input in terms of the number ofmilliseconds since 1 January 1970 00:00:00 GMT |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence ,emma:arc ,and application instancedata |
Here is an example of a timestamp for an absolute time.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:start="1087995961542" emma:end="1087995963542"emma:medium="acoustic" emma:mode="voice"> <destination>Chicago</destination> </emma:interpretation></emma:emma>
Theemma:start
andemma:end
annotations on an input MAY be identical, however theemma:end
value MUST NOT be less than theemma:start
value.
emma:time-ref-uri
,emma:time-ref-anchor-point
,emma:offset-to-start
attributesAnnotation | emma:time-ref-uri |
---|---|
Definition | Attribute of typexsd:anyURI indicating the URIused to anchor the relative timestamp. |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence ,emma:lattice ,and application instancedata |
Annotation | emma:time-ref-anchor-point |
Definition | Attribute with a value ofstart orend , defaulting tostart . It indicateswhether to measure the time from the start or end of the intervaldesignated withemma:time-ref-uri . |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence ,emma:lattice ,and application instancedata |
Annotation | emma:offset-to-start |
Definition | Attributeof typexsd:integer ,defaulting to zero. It specifies the offset in milliseconds for thestart of input from the anchor point designated withemma:time-ref-uri andemma:time-ref-anchor-point |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence ,emma:arc ,and application instancedata |
Relative timestamps define the start of an input relative to thestart or end of a reference interval such as another input.
The reference interval is designated withemma:time-ref-uri
attribute. This MAY be combined withemma:time-ref-anchor-point
attribute to specifywhether the anchor point is the start or end of this interval. Thestart of an input relative to this anchor point is then specifiedwithemma:offset-to-start
attribute.
Here is an example where the referenced input is in the samedocument:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:sequence> <emma:interpretationemma:medium="acoustic" emma:mode="voice"> <origin>Denver</origin> </emma:interpretation> <emma:interpretationemma:medium="acoustic" emma:mode="voice" emma:time-ref-uri="#int1" emma:time-ref-anchor-point="start" emma:offset-to-start="5000"> <destination>Chicago</destination> </emma:interpretation> </emma:sequence></emma:emma>
Note that the reference point refers to an input, but notnecessarily to a complete input. For example, if a speechrecognizer timestamps each word in an utterance, the anchor pointmight refer to the timestamp for just one word.
The absolute and relative timestamps are not mutually exclusive;that is, it is possible to have both relative and absolutetimestamp attributes on the same EMMA container element.
Timestamps of inputs collected by different devices will besubject to variation if the times maintained by the devices are notsynchronized. This concern is outside of the scope of the EMMAspecification.
emma:duration
attributeAnnotation | emma:duration |
---|---|
Definition | Attributeof typexsd:nonNegativeInteger , defaulting to zero. Itspecifies the duration of the input in milliseconds. |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence ,emma:arc ,and application instancedata |
The duration of an input in milliseconds MAY be specified withtheemma:duration
attribute. Theemma:duration
attribute MAY be used either incombination with timestamps or independently, for example in theannotation of speech corpora.
In the following example, the duration of the signal that gaverise to the interpretation is indicated usingemma:duration
.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:duration="2300"emma:medium="acoustic" emma:mode="voice"> <origin>Denver</origin> </emma:interpretation></emma:emma>
This section is informative.
The following table provides guidance on how to determine thevalues of relative timestamps on a composite input.
emma:time-ref-uri | If the reference interval URI is the same for both inputs thenit should be the same for the composite input. If it is not thesame then relative timestamps will have to be resolved to absolutetimestamps in order to determine the combined timestamp. . |
emma:time-ref-anchor-point | If the anchor value is the same for both inputs then it shouldbe the same for the composite input. If it is not the same thenrelative timestamps will have to be resolved to absolute timestampsin order to determine the combined timestamp. |
emma:offset-to-start | Given that theemma:time-ref-uri andemma:time-ref-anchor-point are the same for bothcombining inputs, then theemma:offset-to-start forthe combination should be the lesser of the two. If they are notthe same then relative timestamps will have to be resolved toabsolute timestamps in order to determine the combinedtimestamp. |
emma:duration | Given that theemma:time-ref-uri andemma:time-ref-anchor-point are the same for bothcombining inputs, then theemma:duration is calculatedas follows. Add together theemma:offset-to-start andemma:duration for each of the inputs. Take whicheverof these is greater and subtract from it the lesser of theemma:offset-to-start values in order to determine thecombined duration. Ifemma:time-ref-uri andemma:time-ref-anchor-point are not the same thenrelative timestamps will have to be resolved to absolute timestampsin order to determine the combined timestamp. |
emma:medium
,emma:mode
,emma:function
,emma:verbal
attributesAnnotation | emma:medium |
---|---|
Definition | An attribute of typexsd:nmtokens which contains a space delimited set of values from theset {acoustic ,tactile ,visual }. |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence ,emma:endpoint , and application instance data |
Annotation | emma:mode |
Definition | An attribute of typexsd:nmtokens which contains a space delimited set of values from anopen set of values including: {voice ,dtmf ,ink ,gui ,keys ,video ,photograph ,...}. |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence ,emma:endpoint , and application instance data |
Annotation | emma:function |
Definition | An attribute of typexsd:string constrained tovalues in the open set {recording ,transcription ,dialog ,verification , ...}. |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence , andapplication instance data |
Annotation | emma:verbal |
Definition | An attribute of typexsd:boolean . |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence , andapplication instance data |
Annotation | emma:device-type |
Definition | The type of device, or list of types of device throughwhich the input is captured. An attribute of typexsd:nmtokens which contains a space delimited set ofvalues from an open set of values including:{microphone ,touchscreen ,mouse ,keypad ,keyboard ,pen ,joystick ,touchpad ,scanner ,camera_2d ,camera_3d ,thumbwheel ...}. |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence , and application instancedata |
Annotation | emma:expressed-through |
Definition | The modality, or list of modalities, through which theinterpretation is expressed. An attribute of typexsd:nmtokens which contains a space delimited set ofvalues from an open set of values including: {gaze ,face ,head ,torso ,hands ,leg ,locomotion ,posture ,physiology , ...}. |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence , and application instancedata |
EMMA provides two properties for the annotation of inputmodality. One indicating the broader medium or channel(emma:medium
) and another indicating the specific modeof communication used on that channel (emma:mode
). Theinput medium is defined from the users perspective and indicateswhether they use their voice (acoustic
), touch(tactile
), or visual appearance/motion(visual
) as input. Tactile includes mosthand-on input device types such as pen, mouse, keyboard, andtouch screen. Visual is used for camera input.
emma:medium =space delimited sequence of values from the set: [acoustic|tactile|visual]
The mode property provides the ability to distinguish betweendifferent modes of communication that may be within a particularmedium. For example, in the tactile medium, modes includeelectronic ink (ink
), and pointing and clicking on agraphical user interface (gui
).
emma:mode =space delimited sequence of values from the set: [voice|dtmf|ink|gui|keys|video|photograph| ... ]
Theemma:medium
classification is based on theboundary between the user and the device that they use. Foremma:medium="tactile"
the user physically touches thedevice in order to provide input. Foremma:medium="visual"
the user's movement is capturedby sensors (cameras, infrared) resulting in an input to the system.In the case whereemma:medium="acoustic"
the userprovides input to the system by producing an acoustic signal. Notethen that DTMF input will be classified asemma:medium="tactile"
since in order to provide DTMFinput the user physically presses keys on a keypad.
In order to clarify the difference betweenemma:medium
andemma:mode
consider thefollowing examples of different ways to capture drawn input. If theuser input consists of drawing it will be classified asemma:mode="ink"
. If the user physically draws on atouch sensitive screen then the input is classified asemma:medium ="tactile"
since the user interacts withthe system by direct contact. If instead the user draws on atabletop and their input is captured by a camera mounted above (orbelow) the surface then the input isemma:medium="visual"
. Similarly, drawing on a large screen displayusing hand gestures made in space and sensed with a camera will beclassified asemma:mode="ink"
andemma:medium="visual"
.
Whileemma:medium
andemma:mode
areoptional on specific elements such asemma:interpretation
andemma:one-of
, notethat all EMMA interpretations must be annotated foremma:medium
andemma:mode
, so eitherthese attributes must appear directly onemma:interpretation
or they must appear on an ancestoremma:one-of
node or they must appear on an earlierstage of the derivation listed inemma:derivation
.
Theemma:device-type
annotation can be used toindicate the specific type of device used to capture the input.This allow for differentiation of, multiple differenttactile
inputs within theink
mode, suchas touchscreen input, pen, and mouse.
emma:device-type = space delimited sequence of values from the set: [microphone|keypad|keyboard|touchscreen|touchpad| mouse|pen|joystick|thumbwheel| camera_2d|camera_3d|scanner... ]
Theemma:device-type
attribute SHOULD be used toindicate the general category of the sensor used to captured theinput. The specific model number or characteristics SHOULD becaptured instead usingemma:process
(Section4.2.2).
Orthogonal to the mode, user inputs can also be classified withrespect to their communicative function. This enables a simplermode classification.
emma:function = [recording|transcription|dialog|verification| ... ]
For example, speech can be used for recording (e.g. voicemail),transcription (e.g. dictation), dialog (e.g. interactive spokendialog systems), and verification (e.g. identifying users throughtheir voiceprints).
EMMA also supports an additional propertyemma:verbal
which distinguishes verbal use of an inputmode from non-verbal. This MAY be used to distinguish the use ofelectronic ink to convey handwritten commands from the user ofelectronic ink for symbolic gestures such as circles and arrows.Handwritten commands, such as writingdowntown in order tochange a map display to show the downtown are classified as verbal(emma:function="dialog" emma:verbal="true"
). Pengestures (arrows, lines, circles, etc), such as circling abuilding, are classified as non-verbal dialog(emma:function="dialog" emma:verbal="false"
). The useof handwritten words to transcribe an email message is classifiedas transcription (emma:function="transcription"emma:verbal="true"
).
emma:verbal = [true|false]
Handwritten words and ink gestures are typically recognizedusing different kinds of recognition components (handwritingrecognizer vs. gesture recognizer) and the verbal annotation willbe added by the recognition component which classifies the input.The original input source, a pen in this case, will not be aware ofthis difference. The input source identifier will tell you that theinput was from a pen of some kind but will not tell you if the modeof input was handwriting (show downtown) or gesture (e.g.circling an object or area).
Here is an example of the EMMA annotation for a pen input wherethe user's ink is recognized as either a word ("Boston") or as anarrow:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-of> <emma:interpretation emma:confidence="0.6" emma:medium="tactile" emma:mode="ink"emma:device-type="pen" emma:function="dialog" emma:verbal="true"> <location>Boston</location> </emma:interpretation> <emma:interpretation emma:confidence="0.4" emma:medium="tactile" emma:mode="ink"emma:device-type="pen" emma:function="dialog" emma:verbal="false"> <direction>45</direction> </emma:interpretation> </emma:one-of></emma:emma>
Here is an example of the EMMA annotation for a spoken commandwhich is recognized as either "Boston" or "Austin":
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-of> <emma:interpretation emma:confidence="0.6" emma:medium="acoustic" emma:mode="voice"emma:device-type="microphone" emma:function="dialog" emma:verbal="true"> <location>Boston</location> </emma:interpretation> <emma:interpretation emma:confidence="0.4" emma:medium="acoustic" emma:mode="voice"emma:device-type="microphone" emma:function="dialog" emma:verbal="true"> <location>Austin</location> </emma:interpretation> </emma:one-of></emma:emma>
The following table shows the relationship between the medium,mode, and function properties and serves as an aid for classifyinginputs. For the dialog function it also shows some examples of theclassification of inputs as verbal vs. non-verbal.
Medium | Device-type | Mode | Function | |||
---|---|---|---|---|---|---|
recording | dialog | transcription | verification | |||
acoustic | microphone | voice | audiofile (e.g. voicemail) | spoken command / query / response (verbal = true) | dictation | speaker recognition |
singing a note (verbal = false) | ||||||
tactile | keypad | dtmf | audiofile / character stream | typed command / query / response (verbal = true) | text entry (T9-tegic, word completion, or wordgrammar) | password / pin entry |
command key "Press 9 for sales" (verbal = false) | ||||||
keyboard | dtmf | character / key-code stream | typed command / query / response (verbal = true) | typing | password / pin entry | |
command key "Press S for sales" (verbal = false) | ||||||
pen | ink | trace, sketch | handwritten command / query / response (verbal = true) | handwritten text entry | signature, handwriting recognition | |
gesture (e.g. circling building) (verbal = false) | ||||||
gui | N/A | tapping on named button (verbal = true) | soft keyboard | password / pin entry | ||
drag and drop, tapping on map (verbal = false) | ||||||
touchscreen | ink | trace, sketch | handwritten command / query / response (verbal =true) | handwritten text entry | signature, handwritingrecognition | |
gesture (e.g. circling building) (verbal =false) | ||||||
gui | N/A | tapping on named button (verbal = true) | soft keyboard | password / pin entry | ||
drag and drop, tapping on map (verbal =false) | ||||||
mouse | ink | trace, sketch | handwritten command / query / response (verbal = true) | handwritten text entry | N/A | |
gesture (e.g. circling building) (verbal = false) | ||||||
gui | N/A | clicking named button (verbal = true) | soft keyboard | password / pin entry | ||
drag and drop, clicking on map (verbal = false) | ||||||
joystick | ink | trace,sketch | gesture (e.g. circling building) (verbal = false) | N/A | N/A | |
gui | N/A | pointing, clicking button / menu (verbal = false) | soft keyboard | password / pin entry | ||
visual | scanner | photograph | image | handwritten command / query / response (verbal = true) | optical character recognition, object/scenerecognition (markup, e.g. SVG) | N/A |
drawings and images (verbal = false) | ||||||
camera_2d | photograph | image | objects (verbal = false) | visual object/scene recognition | face id, retinal scan | |
camera_2d | video | movie | sign language (verbal = true) | audio/visual recognition | face id, gait id, retinal scan | |
face / hand / arm / body gesture (e.g. pointing, facing)(verbal = false) |
Theemma:expressed-through
attribute describes themodality through which an input is produced, usually by a humanbeing. This differs from the specific mode of communication(emma:mode
) and the broader channel or medium(emma:medium
). For example in the case where a userprovides ink input on a touchscreen using their hands the inputwould be classified asemma:medium="tactile"
,emma:mode="ink"
, andemma:expressed-through="hands"
. Theemma:expressed-through
attribute is not specific aboutthe sensors used for observing the modality. These can be specifiedusingemma:medium
andemma:mode
attributes.
This mechanism allows for more fine grained annotation of thespecific body part that is analyzed in the assignment of an EMMAresult. For example, in an emotion recognition task using computervision techniques on video camera input,emma:medium="visual"
andemma:mode="video"
. If the face is being analyzed todetermine the result thenemma:expressed-through="face"
while if the body motionis being analyzed thenemma:expressed-through="locomotion"
.
The list of values provided covers a broad range of modalitiesthrough which inputs may be expressed. These values SHOULD be usedif they are appropriate. The list is an open set in order to allowfor more fine-grained distinctions such as "eyes" vs. "mouth"etc.
emma:hook
attributeAnnotation | emma:hook |
---|---|
Definition | An attribute of typexsd:string constrained tovalues in the open set {voice ,dtmf ,ink ,gui ,keys ,video ,photograph , ...} or the wildcardany |
Applies to | Application instance data |
The attributeemma:hook
MAY be used to mark theelements in the application semantics within anemma:interpretation
which are expected to beintegrated with content from input in another mode to yield acomplete interpretation. Theemma:mode
to beintegrated at that point in the application semantics is indicatedas the value of theemma:hook
attribute. The possiblevalues ofemma:hook
are the list of input modes thatcan be values ofemma:mode
(seeSection4.2.11). In addition to these, the value ofemma:hook
can also be the wildcardany
indicating that the other content can come from any source. Theannotationemma:hook
differs in semantics fromemma:mode
as follows. Annotating an element in theapplication semantics withemma:mode="ink"
indicatesthat that part of the semantics came from theink
mode. Annotating an element in the application semantics withemma:hook="ink"
indicates that part of the semanticsneeds to be integrated with content from theink
mode.
To illustrate the use ofemma:hook
consider anexample composite input in which the user says "zoom in here" inthe speech input mode while drawing an area on a graphical displayin the ink input mode.The fact that thelocation
element needs to come from theink
mode is indicated by annotating this applicationnamespace element usingemma:hook
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretationemma:medium="acoustic" emma:mode="voice"> <command> <action>zoom</action> <location emma:hook="ink"> <type>area</type> </location> </command> </emma:interpretation></emma:emma>
For more detailed explanation of this example seeAppendixC.
emma:cost
attributeAnnotation | emma:cost |
---|---|
Definition | An attribute of typexsd:decimal in range 0.0 to10000000, indicating the processor's cost or weight associated withan input or part of an input. |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence ,emma:arc ,emma:node , and applicationinstance data. |
The cost annotation in EMMA indicates the weight or costassociated with an user's input or part of their input. The mostcommon use ofemma:cost
is for representing the costsencoded on a lattice output from speech recognition or otherrecognition or understanding processes.emma:cost
MAYalso be used to indicate the total cost associated with particularrecognition results or semantic interpretations.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-ofemma:medium="acoustic" emma:mode="voice"> <emma:interpretation emma:cost="1600"> <location>Boston</location> </emma:interpretation> <emma:interpretation emma:cost="400"> <location> Austin </location> </emma:interpretation> </emma:one-of></emma:emma>
emma:endpoint-role
,emma:endpoint-address
,emma:port-type
,emma:port-num
,emma:message-id
,emma:service-name
,emma:endpoint-pair-ref
,emma:endpoint-info-ref
attributesAnnotation | emma:endpoint-role |
---|---|
Definition | An attribute of typexsd:string constrained tovalues in the set {source ,sink ,reply-to ,router }. |
Applies to | emma:endpoint |
Annotation | emma:endpoint-address |
Definition | An attribute of typexsd:anyURI that uniquelyspecifies the network address of theemma:endpoint . |
Applies to | emma:endpoint |
Annotation | emma:port-type |
Definition | An attribute of typexsd:QName that specifies thetype of the port. |
Applies to | emma:endpoint |
Annotation | emma:port-num |
Definition | An attribute of typexsd:nonNegativeInteger thatspecifies the port number. |
Applies to | emma:endpoint |
Annotation | emma:message-id |
Definition | An attribute of typexsd:anyURI that specifies themessage ID associated with the data. |
Applies to | emma:endpoint |
Annotation | emma:service-name |
Definition | An attribute of typexsd:string that specifies thename of the service. |
Applies to | emma:endpoint |
Annotation | emma:endpoint-pair-ref |
Definition | An attribute of typexsd:anyURI that specifies thepairing between sink and source endpoints. |
Applies to | emma:endpoint |
Annotation | emma:endpoint-info-ref |
Definition | An attribute of typexsd:IDREF referring to theid attribute of anemma:endpoint-info element. |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence , andapplication instance data. |
Theemma:endpoint-role
attribute specifies the rolethat the particularemma:endpoint
performs inmultimodal interaction. The role valuesink
indicatesthat the particular endpoint is the receiver of the input data. Therole valuesource
indicates that the particularendpoint is the sender of the input data. The role valuereply-to
indicates that the particularemma:endpoint
is the intended endpoint for the reply.The sameemma:endpoint-address
MAY appear in multipleemma:endpoint
elements, provided that the sameendpoint address is used to serve multiple roles, e.g. sink,source, reply-to, router, etc., or associated with multipleinterpretations.
Theemma:endpoint-address
specifies the networkaddress of theemma:endpoint
, andemma:port-type
specifies the port type of theemma:endpoint
. Theemma:port-num
annotates the port number of the endpoint (e.g. the typical portnumber for an http endpoint is 80). Theemma:message-id
annotates the message ID informationassociated with the annotated input. This meta information is usedto establish and maintain the communication context for bothinbound processing and outbound operation. The servicespecification of theemma:endpoint
is annotated byemma:service-name
which contains the definition of theservice that theemma:endpoint
performs. The matchingof thesink
endpoint and its pairingsource
endpoint is annotated by theemma:endpoint-pair-ref
attribute. One sink endpointMAY link to multiple source endpoints throughemma:endpoint-pair-ref
. Further bounding of theemma:endpoint
is possible by using the annotation ofemma:group
(seeSection3.3.2).
Theemma:endpoint-info-ref
attribute associates theEMMA result in the container element with anemma:endpoint-info
element.
The following example illustrates the use of these attributes inmultimodal interactions where multiple modalities are used.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example" xmlns:ex="http://www.example.com/emma/port"> <emma:endpoint-info > <emma:endpoint emma:endpoint-role="sink" emma:endpoint-address="135.61.71.103" emma:port-num="50204" emma:port-type="rtp" emma:endpoint-pair-ref="endpoint2" emma:media-type="audio/dsr-202212; rate:8000; maxptime:40" emma:service-name="travel" emma:mode="voice"> <ex:app-protocol>SIP</ex:app-protocol> </emma:endpoint> <emma:endpoint emma:endpoint-role="source" emma:endpoint-address="136.62.72.104" emma:port-num="50204" emma:port-type="rtp" emma:endpoint-pair-ref="endpoint1" emma:media-type="audio/dsr-202212; rate:8000; maxptime:40" emma:service-name="travel" emma:mode="voice"> <ex:app-protocol>SIP</ex:app-protocol> </emma:endpoint> </emma:endpoint-info> <emma:endpoint-info> <emma:endpoint emma:endpoint-role="sink" emma:endpoint-address="http://emma.example/sink" emma:endpoint-pair-ref="endpoint4" emma:port-num="80" emma:port-type="http" emma:message-id="uuid:2e5678" emma:service-name="travel" emma:mode="ink"/> <emma:endpoint emma:endpoint-role="source" emma:port-address="http://emma.example/source" emma:endpoint-pair-ref="endpoint3" emma:port-num="80" emma:port-type="http" emma:message-id="uuid:2e5678" emma:service-name="travel" emma:mode="ink"/> </emma:endpoint-info> <emma:group> <emma:interpretation emma:start="1087995961542" emma:end="1087995963542" emma:endpoint-info-ref="audio-channel-1" emma:medium="acoustic" emma:mode="voice"> <destination>Chicago</destination> </emma:interpretation> <emma:interpretation emma:start="1087995961542" emma:end="1087995963542" emma:endpoint-info-ref="ink-channel-1" emma:medium="acoustic" emma:mode="voice"> <location> <type>area</type> <points>34.13 -37.12 42.13 -37.12 ... </points> </location> </emma:interpretation> </emma:group></emma:emma>
emma:grammar
element:emma:grammar-ref
attributeAnnotation | emma:grammar-ref |
---|---|
Definition | An attribute of typexsd:IDREF referring to theid attribute of anemma:grammar element. |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence ,andemma:active . |
Theemma:grammar-ref
attribute associates the EMMAresult in the container element with anemma:grammar
element.Theemma:grammar-ref
attribute is alsoused onemma:active
elements withinemma:grammar-active
in order to indicate whichgrammars are active during the processing of an input (4.1.4).
The following example shows the use ofemma:grammar-ref
on the container elementemma:interpretation
and on theemma:active
element:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:grammargrammar-type="application/srgs-xmlref="someURI"/> <emma:grammargrammar-type="application/srgs-xmlref="anotherURI"/> <emma:one-ofemma:medium="acoustic" emma:mode="voice"><emma:grammar-active> <emma:active emma:grammar-ref="gram1"/> <emma:active emma:grammar-ref="gram2"/> </emma:grammar-active> <emma:interpretation emma:grammar-ref="gram1"> <origin>Boston</origin> </emma:interpretation> <emma:interpretation emma:grammar-ref="gram1"> <origin>Austin</origin> </emma:interpretation> <emma:interpretation emma:grammar-ref="gram2"> <command>help</command> </emma:interpretation> </emma:one-of></emma:emma>
emma:model
element:emma:model-ref
attributeAnnotation | emma:model-ref |
---|---|
Definition | An attribute of typexsd:IDREF referring to theid attribute of anemma:model element. |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence , andapplication instance data. |
Theemma:model-ref
annotation associates the EMMAresult in the container element with anemma:model
element.
Example:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:model ref="someURI"/> <emma:model ref="anotherURI"/> <emma:one-ofemma:medium="acoustic" emma:mode="voice"> <emma:interpretation emma:model-ref="model1"> <origin>Boston</origin> </emma:interpretation> <emma:interpretation emma:model-ref="model1"> <origin>Austin</origin> </emma:interpretation> <emma:interpretation emma:model-ref="model2"> <command>help</command> </emma:interpretation> </emma:one-of></emma:emma>
emma:dialog-turn
attributeAnnotation | emma:dialog-turn |
---|---|
Definition | An attribute of typexsd:string referring to thedialog turn associated with a given container element. |
Applies to | emma:interpretation ,emma:group ,emma:one-of , andemma:sequence . |
Theemma:dialog-turn
annotation associates the EMMAresult in the container element with a dialog turn. The syntax andsemantics of dialog turns is left open to suit the needs ofindividual applications. For example, some applications might usean integer value, where successive turns are represented bysuccessive integers. Other applications might combine a name of adialog participant with an integer value representing the turnnumber for that participant. Ordering semantics for comparison ofemma:dialog-turn
is deliberately unspecified and leftfor applications to define.
Example:
<emma:emma version="1.1" emma="http://www.w3.org/2003/04/emma" xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:dialog-turn="u8"emma:medium="acoustic" emma:mode="voice"> <quantity>3</quantity> </emma:interpretation></emma:emma>
emma:result-format
attributeAnnotation | emma:result-format |
---|---|
Definition | An attribute of typexsd:string containing a MIMEtype which indicates the representation used in the applicationsemantics that appears within the containedemma:interpretation . |
Applies to | emma:interpretation ,emma:literal ,emma:group ,emma:one-of , andemma:sequence . |
Typically, the application semantics contained within EMMA is inXML format, as can be seen in examples throughout thespecification. The application semantics can be also be a simplestring, contained withinemma:literal
. EMMA alsoaccommodates other semantic representation formats such as JSON(JavaScript Object Notation [JSON])using CDATA withinemma:literal
. The function of theemma:result-format
attribute is to make explicit thespecific format of the semantic representation. The value is a MIMEtype. The value to generally be used for XML semanticrepresentations istext/xml
. Ifemma:result-format
is not specified, the assumeddefault istext/xml
. If a more specific XML MIME typeis being used then this should be indicated explicitly inemma:result-format
, e.g. for RDF theemma:result-format
would beapplication/rdf+xml
. In the following example, theapplication semantic representation is JSON and the MIME typeapplication/json
appears inemma:result-format
indicating to an EMMA processorwhat to expect within the containedemma:literal
.
<emma:emma
version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns="http://www.example.com/example">
<emma:interpretation id=“int1"
emma:confidence="0.75”
emma:medium="acoustic"
emma:mode="voice"
emma:verbal="true"
emma:function="dialog"
emma:result-format="application/json"
<emma:literal>
<![CDATA[
{
drink: {
liquid:"coke",
drinksize:"medium"},
pizza: {
number: "3",
pizzasize: "large",
topping: [ "pepperoni", "mushrooms" ]
}
}
]]>
</emma:literal>
</emma:interpretation>
</emma:emma>
Note that while many of the examples of semantic representationin the specification are simple lists of attributes and values,EMMA interpretations can contain arbitrarily complex semanticrepresentations. XML representation can be used for the payload, sorepresentations can be nested, have attributes, and ID referencescan be used to capture aspects of the interpretation such asvariable binding or co-reference. Also usingemma:result-format
andemma:literal
asabove, other kinds of logical representations and notations, notnecessarily XML, can also be carried as EMMA payloads.
emma:info
element:emma:info-ref
attributeAnnotation | emma:info-ref |
---|---|
Definition | An attribute of typexsd:IDREF referring totheid attribute of anemma:info element. |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence , and application instancedata. |
Theemma:info-ref
annotation associates the EMMAresult in the container element with a particularemma:info
element. This allows a singleemma:info
block of application and vendor specificannotations to apply to multiple different members of anemma:one-of
oremma:group
oremma:sequence
. Alternatively,emma:info
could appear separately as a child of eachemma:interpretation
. The benefit of usingemma:info-ref
is it avoids the need to repeat the sameblock ofemma:info
for multiple differentinterpretations.
Example:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:info> <customer_type>residential</customer_type> <service_name>acme_travel_service</service_name> </emma:info> <emma:info> <customer_type>residential</customer_type> <service_name>acme_pizza_service</service_name> </emma:info> <emma:one-of emma:start="1087995961542" emma:end="1087995963542" emma:medium="acoustic" emma:mode="voice"> <emma:interpretation emma:confidence="0.75" emma:tokens="flights from boston to denver tomorrow" emma:info-ref="info1"> <origin>Boston</origin> <destination>Denver</destination> </emma:interpretation> <emma:interpretation emma:confidence="0.68" emma:tokens="pizza with pepperoni and onions" emma:info-ref="info2"> <order>pizza</order> <topping>pepperoni</topping> <topping>onion</topping> </emma:interpretation> <emma:interpretation emma:confidence="0.38" emma:tokens="pizza with peppers and cheese" emma:info-ref="info2"> <order>pizza</order> <topping>pepperoni</topping> <topping>cheese</topping> </emma:interpretation> </emma:one-of>
emma:process-model
element:emma:process-model-ref
attributeAnnotation | emma:process-model-ref |
---|---|
Definition | An attribute of typexsd:IDREF referring to theid attribute of anemma:process-model element. |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence , andapplication instance data. |
Theemma:process-model-ref
annotation associatesthe EMMA result in the container element with anemma:process-model
element. In the following examplethe specific model used to produce two different object recognitionresults based on an image as input are indicated on theinterpretations usingemma:process-model-ref
whichreferences anemma:process-model
element underemma:emma
whoseref
attribute containsURI identifying the particular model used.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:process-model type="neural_network" ref="http://example.com/vision/vehicle"/> <emma:process-model type="neural_network" ref="http://example.com/vision/people"/> <emma:one-of emma:start="1087995961542" emma:end="1087995961542" emma:medium="visual" emma:mode="image" emma:process="http://example.com/mycompvision1.xml">> <emma:interpretation emma:confidence="0.9" emma:process-model-ref="pm1"> <object>aircraft</object> </emma:interpretation> <emma:interpretation emma:confidence="0.1" emma:process-model-ref="pm2"> <object>person</object> </emma:interpretation> </emma:one-of></emma:emma>
emma:parameters
element:emma:parameter-ref
attributeAnnotation | emma:parameter-ref |
---|---|
Definition | An attribute of typexsd:IDREF referring to theid attribute of anemma:parameters element. |
Applies to | emma:interpretation ,emma:group ,emma:one-of , andemma:sequence . |
Theemma:parameter-ref
annotation associates theEMMA result(s) in the container element it appears on with anemma:parameters
element that specifies a series ofparameters used to configure the processor that produced thoseresult(s). This allows a set of parameters to be specified once inan EMMA document and referred to by multiple differentinterpretations. Different configurations of parameters can beassociated with different interpretations. In the example, belowthere are twoemma:parameters
elements and in theN-best list of alternative interpretations withinemma:one-of
eachemma:interpretation
references the relevant set of parameters usingemma:parameter-ref
.
<emma:emma version="1.1"
xmlns:emma="http://www.w3.org/2003/04/emma"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2003/04/emma
http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd"
xmlns="http://www.example.com/example">
<emma:parameters api-ref="voicexml2.1">
<emma:parameter name="speedvsaccuracy" value=".5"/>
<emma:parameter name="sensitivity" value=".6"/>
</emma:parameters> <emma:parameters api-ref="voicexml2.1">
<emma:parameter name="speedvsaccuracy" value=".7"/>
<emma:parameter name="sensitivity" value=".3"/>
</emma:parameters> <emma:one-of emma:medium="acoustic" emma:mode="voice" emma:process="http://example.com/myasr1.xml">
<emma:interpretation emma:parameter-ref="parameters1">
<origin>Boston</origin>
</emma:interpretation> <emma:interpretation emma:parameter-ref="parameters2">
<origin>Austin</origin>
</emma:interpretation> </emma:one-of>
</emma:emma>
emma:annotated-tokens
attributeAnnotation | emma:annotated-tokens |
---|---|
Definition | An attribute of typexsd:string holding thereference sequence of tokens determined by a human annotator |
Applies to | emma:interpretation ,emma:group ,emma:one-of ,emma:sequence ,emma:arc , and application instance data. |
Theemma:annotated-tokens
attribute holds a list ofinput tokens. In the following description, the termtokensis used in the computational and syntactic sense ofunits ofinput, and not in the sense ofXML tokens. The valueheld inemma:annotated-tokens
is the list of thetokens of input as determined by a human annotator. For example, incase of speech recognition this will contain the reference string.Theemma:annotated-tokens
annotation MAY be appliednot just to the lexical words and phrases of language but to anylevel of input processing. Other examples of tokenization includephonemes, ink strokes, gestures and any other discrete units ofinput at any level.
In the following example, a speech recognizer has processed anaudio input signal and the hypothesized string is "from cambridgeto london tomorrow" contained inemma:tokens
. A humanlabeller has listened to the audio and added the reference string"from canterbury to london today" in theemma:annotated-tokens
attribute.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:signal="http://example.com/audio/input678.amr" emma:process="http://example.com/asr/params.xml" emma:tokens="from cambridge to london tomorrow"
emma:annotated-tokens="from canterbury to london today"> <origin>Cambridge</origin> <destination>London</destination> <date>tomorrow</date> </emma:interpretation></emma:emma>
In order to provide metadata on the annotation such as the nameof the annotator or time of annotation, the more powerfulemma:annotation
element mechanism should be used. This also allows for structuredannotations such as labelling of a semantic interpretation inXML.
emma:partial-content
Annotation | emma:partial-content |
---|---|
Definition | An attribute of typexsd:Boolean indicatingwhether the content of an element is partial and the full elementcan be retrieved by retrieving the URI indicated in theref attribute on the same element |
Applies to | emma:one-of ,emma:group ,emma:sequence ,emma:lattice ,emma:info ,emma:annotation ,emma:parameters and application instance data. |
Theemma:partial-content
attribute isrequired on the element it applies to when the content containedwithin the element is a subset of the content contained within theelement referred to through theref
attribute on thesame element. If the local element is empty, but a full documentcan be retrieved from the server, then in that caseemma:partial-content
must betrue
. If theelement is empty and the element on the server is also empty thenemma:partial-content
must befalse
. Thedefault value inemma:partial-content
is not specifiedisfalse
.
Theemma:derived-from
element (Section4.1.2) can be used to capture both sequential and compositederivations. This section concerns the scope of EMMA annotationsacrosssequential derivations of user input connectedusing theemma:derived-from
element (Section4.1.2). Sequential derivations involve processing steps that donot involve multimodal integration, such as applying naturallanguage understanding and then reference resolution to a speechtranscription. EMMA derivations describe only single turns of userinput and are not intended to describe a sequence of dialogturns.
For example, an EMMA document could containemma:interpretation
elements for the transcription,interpretation, and reference resolution of a speech input,utilizing theid
values:raw
,better
, andbest
respectively:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:derivation> <emma:interpretation emma:process="http://example.com/myasr1.xml"emma:medium="acoustic" emma:mode="voice"> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation> <emma:interpretation emma:process="http://example.com/mynlu1.xml"> <emma:derived-from resource="#raw" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> </emma:derivation> <emma:interpretation emma:process="http://example.com/myrefresolution1.xml"> <emma:derived-from resource="#better" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation></emma:emma>
Each member of the derivation chain is linked to the previousone by aderived-from
element (Section4.1.2), which has an attributeresource
thatprovides a pointer to theemma:interpretation
fromwhich it is derived. Theemma:process
annotation (Section4.2.2) provides a pointer to the process used for each stage ofthe derivation.
The following EMMA example represents the same derivation asabove but with a more fully specified set of annotations:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:derivation> <emma:interpretation emma:process="http://example.com/myasr1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.6" emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation> <emma:interpretation emma:process="http://example.com/mynlu1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.8" emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <emma:derived-from resource="#raw" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> </emma:derivation> <emma:interpretation emma:process="http://example.com/myrefresolution1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.8" emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <emma:derived-from resource="#better" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation></emma:emma>
EMMA annotations on earlier stages of the derivation oftenremain accurate at later stages of the derivation. Although thiscan be captured in EMMA by repeating the annotations on eachemma:interpretation
within the derivation, as in theexample above, there are two disadvantages of this approach toannotation. First, the repetition of annotations makes theresulting EMMA documents significantly more verbose. Second, EMMAprocessors used for intermediate tasks such as natural languageunderstanding and reference resolution will need to read in all ofthe annotations and write them all out again.
EMMA overcomes these problems by assuming that annotations onearlier stages of a derivation automatically apply to later stagesof the derivation unless a new value is specified. Later stages ofthe derivation essentially inherit annotations from earlier stagesin the derivation. For example, if there was anemma:source
annotation on the transcription(raw
) it would also apply to the later stages of thederivation such as the result of natural language understanding(better
) or reference resolution(best
).
Because of the assumption in EMMA that annotations have scopeover later stages of a sequential derivation, the example EMMAdocument above can be equivalently represented as follows:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:derivation> <emma:interpretation emma:process="http://example.com/myasr1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.6" emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <answer>From Boston to Denver tomorrow</answer> </emma:interpretation> <emma:interpretation emma:process="http://example.com/mynlu1.xml" emma:confidence="0.8"> <emma:derived-from resource="#raw" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>tomorrow</date> </emma:interpretation> </emma:derivation> <emma:interpretation emma:process="http://example.com/myrefresolution1.xml"> <emma:derived-from resource="#better" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation></emma:emma>
The fully specified derivation illustrated above is equivalentto the reduced form derivation following it where only annotationswith new values are specified at each stage. These two EMMAdocuments MUST yield the same result when processed by an EMMAprocessor.
Theemma:confidence
annotation is respecified onthebetter
interpretation. This indicates theconfidence score for natural language understanding, whereasemma:confidence
on theraw
interpretationindicates the speech recognition confidence score.
In order to determine the full set of annotations that apply toanemma:interpretation
element an EMMA processor orscript needs to access the annotations directly on that element andfor any that are not specified follow the reference in theresource
attribute of theemma:derived-from
element to add in annotations fromearlier stages of the derivation.
The EMMA annotations break down into three groups with respectto their scope in sequential derivations. One group of annotationsalways holds true for all members of a sequentialderivation. A second groupis always respecified oneach stage of the derivation. A third group may or may not berespecified.
Classification | Annotation |
---|---|
Applies to whole derivation | emma:signal |
emma:signal-size | |
emma:dialog-turn | |
emma:source | |
emma:medium | |
emma:mode | |
emma:function | |
emma:verbal | |
emma:lang | |
emma:tokens | |
emma:start | |
emma:end | |
emma:time-ref-uri | |
emma:time-ref-anchor-point | |
emma:offset-to-start | |
emma:duration | |
Specified at each stage of derivation | emma:derived-from |
emma:process | |
May be respecified | emma:confidence |
emma:cost | |
emma:grammar-ref | |
emma:model-ref | |
emma:no-input | |
emma:uninterpreted |
One potential problem with this annotation scoping mechanism isthat earlier annotations could be lost if earlier stages of aderivation were dropped in order to reduce message size. Thisproblem can be overcome by considering annotation scope at thepoint where earlier derivation stages are discarded and populatingthe final interpretation in the derivation with all of theannotations which it could inherit. For example, if theraw
andbetter
stages were dropped theresulting EMMA document would be:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:start="1087995961542" emma:end="1087995963542" emma:process="http://example.com/myrefresolution1.xml" emma:source="http://example.com/microphone/NC-61" emma:signal="http://example.com/signals/sg23.wav" emma:confidence="0.8" emma:medium="acoustic" emma:mode="voice" emma:function="dialog" emma:verbal="true" emma:tokens="from boston to denver tomorrow" emma:lang="en-US"> <emma:derived-from resource="#better" composite="false"/> <origin>Boston</origin> <destination>Denver</destination> <date>03152003</date> </emma:interpretation></emma:emma>
Annotations on anemma:one-of
element are assumedto apply to all of the container elements within theemma:one-of
.
Ifemma:one-of
appears with anotheremma:one-of
then annotations on the parentemma:one-of
are assumed to apply to the children ofthe childemma:one-of
.
Annotations onemma:group
oremma:sequence
do not apply to their childelements.
The contents of this section are normative.
A document is a Conforming EMMA Document if it meets both thefollowing conditions:
The EMMA specification and these conformance criteria provide nodesignated size limits on any aspect of EMMA documents. There areno maximum values on the number of elements, the amount ofcharacter data, or the number of characters in attributevalues.
Within this specification, the term URI refers to aUniversal Resource Identifier as defined in [RFC3986]and extended in [RFC3987]with the new name IRI. The term URI has been retained in preferenceto IRI to avoid introducing new names for concepts such as "BaseURI" that are defined or referenced across the whole family of XMLspecifications.
The EMMA namespace is intended to be used with other XMLnamespaces as per the Namespaces in XML Recommendation [XMLNS].Future work by W3C is expected to address ways to specifyconformance for documents involving multiple namespaces.
A EMMA processor is a program that can process and/or generateConforming EMMA documents.
In a Conforming EMMA Processor, the XML parser MUST be able toparse and process all XML constructs defined by XML 1.1 [XML] andNamespaces in XML [XMLNS].It is not required that a Conforming EMMA Processor uses avalidating XML parser.
A Conforming EMMA Processor MUST correctly understand and applythe semantics of each markup element or attribute as described bythis document.
There is, however, no conformance requirement with respect toperformance characteristics of the EMMA Processor. For instance, nostatement is required regarding the accuracy, speed or othercharacteristics of output produced by the processor. No statementis made regarding the size of input that a EMMA Processor isrequired to support.
This section is Normative.
This section defines the formal syntax for EMMA documents interms of a normative XML Schema.
The schema provided here is for the EMMA 1.0 Recommendation. Noschema exists as of yet for the EMMA 1.1 Working Draft as it is awork in progress.
There are both an XML Schema andRELAX NG Schemafor the EMMA markup. The latest version of the XML Schema for EMMAis available athttp://www.w3.org/TR/emma/emma.xsdand the RELAX NG Schema can be found athttp://www.w3.org/TR/emma/emma.rng.
For stability it is RECOMMENDED that you use the dated URIavailable athttp://www.w3.org/TR/2009/REC-emma-20090210/emma.xsdandhttp://www.w3.org/TR/2009/REC-emma-20090210/emma.rng.
This section is Normative.
The media type associated with the EMMA: Extensible MultiModal Annotation markup languagespecification is "application/emma+xml" and the filename suffix is".emma" as defined inAppendix B.1 of the EMMA: Extensible Multimodal Annotation specification.
emma:hook
and SRGSThis section isInformative.
One of the most powerful aspects of multimodal interfaces istheir ability to provide support for user inputs which aredistributed over the available input modes. Thesecompositeinputs are contributions made by the user within a single turnwhich have component parts in different modes. For example, theuser might say "zoom in here" in the speech mode while drawing anarea on a graphical display in the ink mode. One of the centralmotivating factors for this kind of input is that different kindsof communicative content are best suited to different input modes.In the example of a user drawing an area on a map and saying "zoomin here", the zoom command is easiest to provide in speech but thespatial information, the specific area, is easier to provide inink.
Enabling composite multimodality is critical in ensuring thatmultimodal systems support more natural and effective interactionfor users. In order to support composite inputs, a multimodalarchitecture must provide some kind of multimodal integrationmechanism. In the W3C Multimodal Interaction Framework[MMIFramework], multimodal integration can be handled by anintegration component which follows the application of speechunderstanding and other kinds of interpretation procedures forindividual modes.
Given the broad range of different techniques being employed formultimodal integration and the extent to which this is an ongoingresearch problem, standardization of the specific method oralgorithm used for multimodal integration is not appropriate atthis time. In order to facilitate the development andinter-operation of different multimodal integration mechanisms EMMAprovides markup language enabling application independentspecification of elements in the application markup where contentfrom another mode needs to be integrated. These representation'hooks' can then be used by different kinds of multimodalintegration components and algorithms to drive the process ofmultimodal integration. In the processing of a composite multimodalinput, the result of applying a mode-specific interpretationcomponent to each of the individual modes will be EMMA markupdescribing the possible interpretation of that input.
One way to build an EMMA representation of a spoken input suchas "zoom in here" is to use grammar rules in the W3C SpeechRecognition Grammar Specification [SRGS]using the Semantic Interpretation[SISR]tags to build the application semantics with theemma:hook
attribute. In this approach[ECMAScript]is specified in order to build up an object representing thesemantics. The resulting ECMAScript object is then translated toXML.
For our example case of "zoom in here". The following SRGS rulecould be used. TheSemantic Interpretation for SpeechRecognition specification[SISR]provides a reserved property_nsprefix for indicating thenamespace to be used with an attribute.
<rule> zoom in here <tag> $.command = new Object(); $.command.action = "zoom"; $.command.location = new Object(); $.command.location._attributes = new Object(); $.command.location._attributes.hook = new Object(); $.command.location._attributes.hook._nsprefix = "emma"; $.command.location._attributes.hook._value = "ink"; $.command.location.type = "area"; </tag></rule>
Application of this rule will result in the following ECMAScriptobject being built.
command: { action: "zoom" location: { _attributes: { hook: { _nsprefix: "emma" _value: "ink" } } type: "area" }}
SIprocessing in an XML environment would generate the followingdocument:
<command> <action>zoom</action> <location emma:hook="ink"> <type>area</type> </location></command>
This XML fragment might then appear within an EMMA document asfollows:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:medium="acoustic" emma:mode="voice"> <command> <action>zoom</action> <location emma:hook="ink"> <type>area</type> </location> </command> </emma:interpretation></emma:emma>
Theemma:hook
annotation indicates that this speechinput needs to be combined with ink input such as thefollowing:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:medium="tactile" emma:mode="ink"> <location> <type>area</type> <points>42.1345 -37.128 42.1346 -37.120 ... </points> </location> </emma:interpretation></emma:emma>
This representation could be generated by a pen modalitycomponent performing gesture recognition and interpretation. Theinput to the component would be anInk Markup Languagespecification[INKML]of the ink trace and the output would be the EMMA documentabove.
The combination will result in the following EMMA document forthe combined speech and pen multimodal input.
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretation emma:medium="acoustic tactile" emma:mode="voice ink" emma:process="http://example.com/myintegrator.xml"> <emma:derived-from resource="http://example.com/voice1.emma/#voice1" composite="true"/> <emma:derived-from resource="http://example.com/pen1.emma/#pen1" composite="true"/> <command> <action>zoom</action> <location> <type>area</type> <points>42.1345 -37.128 42.1346 -37.120 ... </points> </location> </command> </emma:interpretation></emma:emma>
There are two components to the process of integrating these twopieces of semantic markup. The first is to ensure that the two arecompatible; that is, that no semantic constraints are violated. Thesecond is to fuse the content from the two sources. In our example,the<type>area</type>
element is intendedto indicate that this speech command requires integration with anarea gesture rather than, for example, a line gesture, which wouldhave the subelement<type>line</type>
.This constraint needs to be enforced by whatever mechanism isresponsible for multimodal integration.
Many different techniques could be used for achieving thisintegration of the semantic interpretation of the pen input, a<location>
element, with the corresponding<location>
element in the speech. Theemma:hook
simply serves to indicate theexistence of this relationship.
One way to achieve both the compatibility checking and fusion ofcontent from the two modes is to use a well-defined general purposematching mechanism such as unification.Graph unification[Graphunification] is a mathematical operation definedover directed acylic graphs which captures both of the componentsof integration in a single operation: the applications of thesemantic constraints and the fusing of content. One possiblesemantics for theemma:hook
markup indicates thatcontent from the required mode needs to be unified with thatposition in the application semantics. In order to unify, twoelements must not have any conflicting values for subelements orattributes. This procedure can be defined recursively so thatelements within the subelements must also not clash and so on. Theresult of unification is the union of all of the elements andattributes of the two elements that are being unified.
In addition to the unification operation, in the resultingemma:interpretation
theemma:hook
attribute needs to be removed and theemma:mode
attribute changed tothe list of the modes of the individualinputs, e.g."voice ink"
.
Instead of the unification operation, for a specific applicationsemantics, integration could be achieved using some other algorithmor script. The benefit of using the unification semantics foremma:hook
is that it provides a general purposemechanism for checking the compatibility of elements and fusingthem, whatever the specific elements are in the applicationspecific semantic representation.
The benefit of using theemma:hook
annotation forauthors is that it provides an application independent method forindicating where integration with content from another mode isrequired. If a general purpose integration mechanism is used, suchas the unification approach described above, authors should be ableto use the same integration mechanism for a range of differentapplications without having to change the integration rules orlogic. For each application the speech grammar rules [SRGS]need to assignemma:hook
to the appropriate elementsin the semantic representation of the speech. The general purposemultimodal integration mechanism will use theemma:hook
annotations in order to determine where toadd in content from other modes. Another benefit of theemma:hook
mechanism is that it facilitatesinteroperability among different multimodal integration components,so long as they are all general purpose and utilizeemma:hook
in order to determine where to integratecontent.
The following provides a more detailed example of the use of theemma:hook
annotation. In this example, spoken input iscombined with twoink gestures. The semanticrepresentation assigned to the spoken input "send this file tothis" indicates two locations where content is required from inkinput usingemma:hook="ink"
:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:interpretationid="voice2" emma:medium="acoustic" emma:mode="voice" emma:tokens="send this file to this" emma:start="1087995961500" emma:end="1087995963542"> <command> <action>send</action> <arg1> <object emma:hook="ink"> <type>file</type> <number>1</number> </object> </arg1> <arg2> <object emma:hook="ink"> <number>1</number> </object> </arg2> </command> </emma:interpretation></emma:emma>
The user gesturing on the two locations on the display can berepresented usingemma:sequence
:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:sequenceid="ink2"> <emma:interpretationemma:start="1087995960500" emma:end="1087995960900" emma:medium="tactile" emma:mode="ink"> <object> <type>file</type> <number>1</number> <id>test.pdf</id> <object> </emma:interpretation> <emma:interpretationemma:start="1087995961000" emma:end="1087995961100" emma:medium="tactile" emma:mode="ink"> <object> <type>printer</type> <number>1</number> <id>lpt1</id> <object> </emma:interpretation> </emma:sequence></emma:emma>
A general purpose unification-based multimodal integrationalgorithm could use theemma:hook
annotation asfollows. It identifies the elements marked withemma:hook
in document order. For each of those inturn, it attempts to unify the element with the correspondingelement in order in theemma:sequence
. Since none ofthe subelements conflict, the unification goes through and as aresult, we have the following EMMA for the composite result:
<emma:emma version="1.1" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"><emma:interpretationid="multimodal2" emma:medium="acoustic tactile" emma:mode="voice ink" emma:tokens="send this file to this" emma:process="http://example.com/myintegration.xml" emma:start="1087995960500" emma:end="1087995963542"> <emma:derived-from resource="http://example.com/voice2.emma/#voice2" composite="true"/> <emma:derived-from resource="http://example.com/ink2.emma/#ink2" composite="true"/> <command> <action>send</action> <arg1> <object> <type>file</type> <number>1</number> <id>test.pdf</id> </object> </arg1> <arg2> <object> <type>printer</type> <number>1</number> <id>lpt1</id> </object> </arg2> </command></emma:interpretation></emma:emma>
This section isInformative.
The W3C Document Object Model [DOM]defines platform and language neutral interfaces that givesprograms and scripts the means to dynamically access and update thecontent, structure and style of documents. DOM Events define ageneric event system which allows registration of event handlers,describes event flow through a tree structure, and provides basiccontextual information for each event.
This section of the EMMA specification extends the DOM Eventinterface for use with events that describe interpreted user inputin terms of a DOM Node for an EMMA document.
// File: emma.idl#ifndef _EMMA_IDL_#define _EMMA_IDL_#include "dom.idl"#include "views.idl"#include "events.idl"#pragma prefix "dom.w3c.org"module emma{ typedef dom::DOMString DOMString; typedef dom::Node Node; interface EMMAEvent : events::UIEvent { readonly attribute dom::Node node; void initEMMAEvent(in DOMString typeArg, in boolean canBubbleArg, in boolean cancelableArg, in Node node); };};#endif // _EMMA_IDL_
This section isInformative.
Since the publication of the EMMA 1.0 Recommendation, thefollowing changes have been made.
emma:annotation
element for specification of humanannotations on the input (4.1.9)emma:process-model
for specifying a non-grammarmodel used in processing of the input (4.1.7)emma:parameters
,emma:parameter
forspecification of a set of parameters used to configure a processor(4.1.8)emma:grammar-active, emma:active
elements forspecifying the specific grammars in a set that were active for aparticular interpretation or set of interpretations (4.1.4)emma:expressed-through
for specification of themodalities used in order to express an input (4.2.11)emma:result-format
for specification of thespecific format type for EMMA semantic payloads (4.2.18)emma:info-ref
for referencing the emma:info thatapplies to an interpretation or set of interpretations(4.2.19)emma:info
elementsemma:process-model-ref
for referencing theemma:process-model
that applies to an interpretationor set of interpretations(4.2.20)emma:parameter-ref
for referencing theemma:parameters
that applies to an interpretation orset of interpretations(4.2.21)emma:annotated-tokens
shorthand method for addingreference transcription without needing fullemma:annotation
(4.2.22)emma:medium
andemma:mode
(4.2.11)emma:one-of
emma:process
can be used as syntaxrather than actual reference to process description (4.2.2)emma:signal
(4.2.6)emma:annotation
on latticeemma:arc
emma:annotation
andemma:annotated-tokens
emma:grammar-ref
on use ofemma:grammar-ref
onemma:active
toindicate which grammars are activeemma:process-model
andemma:parameters
are required to have index scope and cannot have scope overinterpretations based on their position in the documentemma:device-type
to 4.2.11 and extendedexample and added to tables of relevant elementsref
to several more elements enablingdocuments to refer to content on the server:emma:info
,emma:parameters
,emma:one-of
,emma:group
,emma:sequence
,emma:lattice
src
attribute onemma:annotation
withref
to keep itconsistent with other elements that allow for reference to remotecontent, and added an example with Emotion ML.emma:location
element enablingannotation of the location of the device capturing the input.4.1.10prev-doc
anddoc-ref
attributestoemma:emma
.emma:partial-content
attribute 4.2.23This section isInformative.
The editors would like to recognize the contributions of thecurrent and former members of the W3C Multimodal Interaction Group(listed in alphabetical order):