Movatterモバイル変換
[0]ホーム
This document describes requirements for the ExtensibleMultiModal Annotation language (EMMA) specification underdevelopment in theW3C Multimodal InteractionActivity. EMMA is intended as a data format for the interfacebetween input processors and interaction management systems. It willdefine the means for recognizers to annotate application specificdata with information such as confidence scores, time stamps, inputmode (e.g. key strokes, speech or pen), alternative recognitionhypotheses, and partial recognition results, etc. EMMA is a targetdata format for the semantic interpretation specification beingdeveloped in theVoice Browser Activity, andwhich describes annotations to speech grammars for extractingapplication specific data as a result of speech recognition. EMMAsupercedes earlier work on the natural language semantics markuplanguage in the Voice Browser Activity.
Status of this Document
This section describes the status of this document at thetime of its publication. Other documents may supersede thisdocument. The latest status of this document series is maintainedat theW3C.
W3C'sMultimodalInteraction Activity is developing specifications for extendingthe Web to support multiple modes of interaction. This documentprovides the basis for guiding and evaluating subsequent work on aspecification for a data format (EMMA) that acts as an exchangemechanism between input processors and interaction managementcomponents in a multimodal application. These components areintroduced in theW3C MultimodalInteraction Framework.
This document is a NOTE made available by the W3C for archivalpurposes, and is not expected to undergo frequent changes. Publicationof this Note by W3C indicates no endorsement by W3C or the W3C Team,or any W3C Members. A list of current W3C technical reports andpublications, including Recommendations, Working Drafts, and Notescan be found athttp://www.w3.org/TR/.
This document has been produced as part of theW3C Multimodal InteractionActivity,following the procedures set out for theW3C Process. Theauthors of this document are members of theMultimodal InteractionWorking Group (W3C Membersonly). This is a Royalty Free Working Group, as described inW3C'sCurrentPatent Practice NOTE. Working Group participants are requiredto providepatentdisclosures.
Please send comments about this document to the public mailinglist:www-multimodal@w3.org (publicarchives). To subscribe, send an email to <www-multimodal-request@w3.org>with the wordsubscribe in the subject line (include thewordunsubscribe if you want to unsubscribe).
Table of Contents
- Introduction
- 1. Scope of EMMA
- 2. Data model requirements
- 3. Annotation requirements
- 4. Integration with other work
Introduction
Extensible MultiModal Annotation language (EMMA) is the markup language used to represent human input to a multimodal application. As such, it may be seen in terms of theW3C Multimodal Interaction Frameworkas the exchange mechanism between user input devices and theinteraction management capabilities of an application.
General Principles
An EMMA document can be considered to hold three types of data:
- instance data
The slots and values corresponding to input information which is meaningful to the consumer of an EMMA document.Instances areapplication-specific and built by input processors at runtime. Given that utterances may be ambiguous with respect to input values,an EMMA document may hold more than one instance. - data model
The constraints on structure and content of an instance. The data model is typically pre-established by an application, andmay be implicit, that is, unspecified. - metadata
Annotations associated with the data contained in the instance. Annotation values are added by input processors at runtime.
Given the assumptions above about the nature of data represented in an EMMA document, the following general principles apply to the design of EMMA:
- Themain prescriptive contentof the EMMA specification will consist of metadata: EMMA will provide a means to express the metadata annotations which require standardization.(Notice, however, that such annotations may express the relationship among all the types of data within an EMMA document.)
- The instance and its data model is assumed to be specified in XML, but EMMA will remain agnostic to the XML format used to express these. (The instance XML is assumed to be sufficiently structured to enable the association of annotative data.)
The following sections apply these principles in terms of the scope of EMMA, the requirements on the contents and syntax of data model and annotations, and EMMA integration with other work.
- EMMA must be able to represent the following kinds of input:
- 1.1 input in any human language
- 1.2 input from the modalities and devices specified in the next section
- input reflecting the results of the following processes:
- input gained in any of the following ways:
- 1.5 single modality input
- 1.6 sequential modality input,that is: single-modality inputs presented in sequence
- 1.7 simultaneous modality input (as defined in the mainMMI requirements doc).
- 1.8 composite modality input (as defined in the mainMMI requirements doc).
- EMMA must be able to represent input from the following modalities, devices and architectures:
- human language input modalities
- 1.9 text
- 1.10 speech
- 1.11 handwriting
- 1.12 other modalities identified by theMMI Requirements document as required
- 1.13 combinations of the above modalities
- devices
- 1.14 telephones (i.e. no device processing, proxy agent)
- 1.15 thin clients (i.e. limited device processing)
- 1.16 rich clients (i.e. powerful device processing)
- 1.17 everything in this range
- known and foreseeable network configurations
- 1.18 architectures
- 1.19 protocols
- 1.20 extensibility to further devices and modalities
- Representation of output and other uses
EMMA is considered primarilyas a representation of user input, and it is in this context that the rest of this document defines the requirements on EMMA. Given that the focus of EMMA is on meta information, sufficient need is not seen at this stage to define standard annotations for system output nor for general message content between system components.However, the following requirement is included to ensure that EMMA may still be used in these cases where necessary.
- 1.21 The following uses of EMMA must not be precluded:
- a representation from which system output markup may be generated;
- a language for general purpose communication among system components.
- Ease of use and portability
- 1.22 EMMA content must be accessible via standard means (e.g. XPath).
- 1.23 Queries on EMMA content must be easy to author.
- 1.24 The EMMA specification mustenable portability of EMMA documents across applications.
- Data model content
The following requirements apply to the use of data models in EMMA documents
- 2.1 use of a data model and constraints must be possible, for the purposes of validation and interoperability
- 2.2 use of a data model will not be required
- in other words, it must be possible to rely on an implicit data model.
- 2.3 it must be possible in a single EMMA document to associate different data models with different instances
It is assumed that the combination and decomposition of data models will be supported by data model description formats (e.g. XML Schema),and that the comparison of data models is enabled by standard XML comparison mechanisms (e.g. use of XSLT, XPath). Therefore this functionalityis not considered a requirement on EMMA data modelling.
- Data model description formats
The following requirements apply to the description format of data models used in EMMA documents
- 2.4 existing standard formats must be able to be used, for example:
- arbitrary XML
- XML Schema
- XForms
- 2.5 no single description format is required
The use of a data model in EMMA is for the purpose of validating an EMMA instance against the constraints of a data model. Since Web applications today use different formats to specify data models, e.g. XML Schema, XForms, Relax-NG, etc., the principle that EMMA does not require a single format enables EMMA to be used in a variety of application contexts. The concern that this may lead to problems of interoperability has been discussed,and will be reviewed during production of the specification. - 2.6 data model declarationsmust be able to be specified inline or referenced
- Annotation content
EMMA must enable the specification of the following features.For each annotation feature, "local" annotation is assumed: that is, that the association of the annotation may be at any level within the instance structure, and not only at the highest level.
- General meta data
- 3.1 lack of input
- 3.2 uninterpretable input
- 3.3 identification of input source
- 3.4 time stamps
- 3.5 relative positioning of input events
(NB: This requirement is covered explicitly by time stamps, but reflects use of EMMA in environments in which times tamping may not be possible.) - 3.6 temporal grouping of input events
- 3.7 human language of input
- 3.8 identification of input modality
- Annotational structure
- 3.9 association to corresponding instance element annotated
- 3.10 reference to data model definition
- 3.11 composite multimodal input: representation of input from multiple modalities.
- Recognition (signal --> tokens processing)
- 3.12 reference to signal
- 3.13 reference to processing used (e.g.SRGS grammar)
- 3.14 tokens of utterance
- 3.15 ambiguity
This enables a tree-based representation of local ambiguity. That is, alternatives are expressible for given nodes in the structure. - 3.16 confidence scores of recognition
- Interpretation (tokens --> semantic processing)
- 3.17 tokens of utterance
- 3.18 reference to processing used (e.g.SRGS)
- 3.19 ambiguity
- 3.20 confidence scores of interpretation
- Recognition and Interpretation (signal --> semantic processing)
- 3.21 union of Recognition/Interpretation features,(e.g.SRGS +SI)
- Modality-dependent annotations
- 3.22 EMMA must be extensible to annotations which are specific to particular modalities, e.g. those of:
4.1 Where such alignment is appropriate, EMMA must enable the use and integration of widely adopted standard specifications and features. The following activities are considered most relevant in this respect:
- W3C activities
- MMI activities
- MMI general requirements
- Events subgroup requirements
- Integration subgroup requirements
- Ink subgroup requirements
- Voice Browser activities
- SRGS: EMMA must enable results from speech usingSRGS
- SI: EMMA must enable results from speech usingSRGS withSI output
- Other W3C activities
- Relevant XML-related activities
- RDF working group
- Other organizations and standards
[8]ページ先頭