Copyright © 2003W3C® (MIT,ERCIM,Keio), All Rights Reserved. W3Cliability,trademark,document use andsoftware licensing rules apply.
This document describes fundamental requirements for thespecifications under development in the W3CMultimodal InteractionActivity. These requirements were derived from use case studiesas discussed inAppendix A. They have beendeveloped for use by theMultimodal InteractionWorking Group (W3C Membersonly), but may also be relevant to other W3C working groups andrelated external standard activities.
The requirements cover general issues, inputs, outputs,architecture, integration, synchronization points, runtimes anddeployments, but this document does not address application ordeployment conformance rules.
This section describes the status of this document at thetime of its publication. Other documents may supersede thisdocument. The latest status of this document series is maintainedat theW3C.
W3C'sMultimodalInteraction Activity is developing specifications for extendingthe Web to support multiple modes of interaction. This documentdescribes fundamental requirements for multimodal interaction.
This document has been produced as part of theW3C Multimodal InteractionActivity,following the procedures set out for theW3C Process. Theauthors of this document are members of theMultimodal InteractionWorking Group (W3C Membersonly). This is a Royalty Free Working Group, as described inW3C'sCurrentPatent Practice NOTE. Working Group participants are requiredto providepatentdisclosures.
Please send comments about this document to the public mailinglist:www-multimodal@w3.org (publicarchives). To subscribe, send an email to <www-multimodal-request@w3.org>with the wordsubscribe in the subject line (include thewordunsubscribe if you want to unsubscribe).
A list of current W3C Recommendations and other technicaldocuments including Working Drafts and Notes can be found athttp://www.w3.org/TR/.
Multimodal interactions extend the Web user interface to allowmultiple modes of interaction, offering users the choice of usingtheir voice, or an input device such as a key pad, keyboard, mouseor stylus. For output, users will be able to listen to spokenprompts and audio, and to view information on graphical displays.This capability for the user to specify the mode or device for aparticular interaction in a particular situation is expected tosignificantly improve the user interface, its accessibility andreliability, especially for mobile applications. The W3C MultimodalInteraction Working Group (WG) is developing markup specificationsfor authoring applications synchronized across multiple modalitiesor devices with a wide range of capabilities.
This document is an internal working draft prepared as part ofthe discussions on multimodal interaction requirements formultimodal interaction specifications.
The work on the present requirement document started from themultimodal requirements for voice markup languages publicworking draft (version 1.0) published by the W3C Voiceactivity[MM Req Voice]. The outline ofthe document remains very similar.
The present requirements scope the nature of the work andspecifications that will be developed by the W3C MultimodalInteraction Working Group (as specified by the charter[MMI Charter]). These intended works may bereferred to below as "specification(s)".
The requirements in this document do not express conformancerules on application, platform runtime implementation ordeployment.
In this document, the following conventions have been followedwhen phrasing the requirements:
It is not required that a particular specification produced bythe W3C MMI working group addressesall the requirementsin this document. It is possible that the requirements be addressedby different specifications and that all the "MUST specify"requirement are only satisfied by combining the differentspecifications produced by the W3C Multimodal Interaction WorkingGroup. However, in such a case, it should be possible to clearlyindicate which specification will address what requirements.
To lay the groundwork for the technical requirements, we firstdiscuss an intended frame of reference for a multimodal system,introducing various concepts and terms that will be referred to inthe normative sections below. For the reader's convenience, we havecollected the concepts and terms introduced in this frame ofreference in theglossary.
We are interested in defining the requirements for the design ofmultimodal systems -- systems thatsupport a user communicating with an application by using differentmodalities such as voice (in ahuman language), gesture,handwriting, typing,audio-visual speech, etc. The usermay be considered to be operating in adelivery context: a term used tospecify the set of attributes that characterizes the capabilitiesof the access mechanism in terms ofdeviceprofile,user profile (e.g.identify, preferences and usage patterns) andsituation. The user interacts with theapplication in the context of asession,using one or more modalities (which may be realized through one ormore devices). Within a session, the user maysuspend and resume interaction with theapplication within the same modality orswitch modalities. A session isassociated with acontext, whichrecords the interactions with the user.
In multimodal systems, anevent is arepresentation of some asynchronous occurrence of interest to themultimodal system. Examples include mouse clicks, hanging up thephone, speech recognition results or errors. Events may beassociated with information about the user interaction e.g. thelocation the mouse was clicked. A typical event source is a user,such events are calledinput events. Anexternal input event is one not generatedby a user, e.g. aGPS signal. The multimodalsystem may also produceexternal outputevents for external systems (e.g. a logging system). In orderto preserve temporal ordering, events may betime stamped. Typically, events areformalized as generated byeventsources, and associated witheventhandlers, whichsubscribe to theevent, and arenotified of its occurrence.This is exemplified by theXML Eventmodel.
The user typically provides input in one or more modalities, andreceives output in one or more modalities. Input may beclassified assequential,simultaneous orcomposite. Sequential input is inputreceived on a single modality, though that modality can change overtime. Simultaneous input is input received on multiple modalities,and treated separately by downstream processes (such asinterpretation). Composite input is input received on multiplemodalities at the same time and treated as a single, integrated"composite" input by downstream processes. Inputs are combinedusing thecoordinationcapability of the multimodal system, typically driven byinput constraints or decided by theinteraction manager.
Input is typically subject toinputprocessing. For instance, speech input may be input to aspeech recognition engine(including, for instance,semanticinterpretation in order to extract meaningful information (e.g.semantic representation) for downstreamprocessing. Note that simultaneous and composite input may beconflicting, in that theinterpretations of the input may not be consistent (e.g. the usersays "yes" but clicks on "no").
Two fundamentally different uses of multimodality may beidentified:supplementarymultimodality, andcomplementarymultimodality. An application makes supplementary use ofmultimodality if it allows to carry every interaction (inputor output) through to completion in each modality as if it was theonly available modality. Such an application enables the user toselect at each time the modality that is best suited to the natureof the interaction and the user's situation. Conversely, anapplication makes complementary use of multimodality ifinteractions in one modality are used to complement interactions inanother. (For instance, the application may visually displayseveral options in a form and aurally prompt the user "Choose thecity to fly to".) Complementary use may help a particular class ofusers (e.g. those with dyslexia). Note that in an applicationsupporting complementary use of different modalities eachinteraction may not be acessible separately in each modality.Therefore it may not be possible for the user to determine whichmodality to use. Instead, the document author may prescribe themodality (or modalities) to be used in a particularinteraction.
Thesynchronizationbehavior of an application describes the way in which any inputin one modality is reflected in the output in another modality, aswell as the way input is combined across modalities (coordination capability). Thesynchronization granularityspecifies the level at which the application coordinatesinteractions. The application is said to exhibitevent-levelsynchronization if user inputs in one modality are captured atthe level of the individualDOM events andimmediately reflected in the other modality. The applicationexhibitsfield-level synchronization if inputs in onemodality are reflected in the other after the user changes focus(e.g. moves from input field to input field) or completes theinteraction (e.g. completes a select in a menu). The applicationexhibitsform-level synchronization if inputs in onemodality are reflected in the other only after a particular pointin the presentation is reached (e.g. after a certain number offields have been completed in the form).
The output generatedstatus by a multimodal system can takevarious forms, e.g. audio (including spoken prompts and playback,e.g. usingnatural language generation,text-to-speech (TTS) whichsynthesizes audio), visual (e.g. XHMTL or SVGmarkup rendered on displays),lipsynch(multimedia output in which there is avisual rendition of a face whose lip movements are synchronizedwith the audio), etc. Of relevance here is the W3C RecommendationSMIL 2.0 which enables simple authoring ofinteractive audiovisual applications and supportsmedia synchronization.
Interaction (input, output) between the user and the applicationmay often be conceptualized as a series of dialogs, manged by aninteraction manager. A dialog is aninteraction between the user and the application which involvesturn taking. In each turn, the interaction manager manager(working on behalf of the application) collects input from theuser, processes it (using the session context and possibly externalknowledge sources) to determine , computes a response and updatesthe presentation for the user. An interaction manager generates orupdates the presentation by processing user inputs, session contextand possibly other external knowledge sources to determine theintent of the user. An interaction manager relies on strategies todetermine focus and intent as well as to disambiguate, correct andconfirm sub-dialogs. We typically distinguishdirected dialogs (e.g. user-driven orapplication-driven) andmixedinitiative or free flow dialogs.
The interaction manager may use (1) inputs from the user, (2)the session context, (3) external knowledge sources, and (4)disambiguation, correction, and configuration sub-dialogs todetermine the user's focus and intent. Based on the user's focusand intent, the interaction manager also (1) maintains the contextand state of the application, (2) manages the composition of inputsand synchronization across modalities, (3) interfaces with businesslogic, and (4) produces output for presentation to the user. Insome architectures, the interaction manager may havedistributed components, utilizingan event based mechanism for coordination.
Finally, in this document, we use the termconfiguration or execution model torefer to the runtime structure of the various system components andtheir interconnection, in a particular manifestation of amultimodal system.
It is the intent of the WG to define specifications that applyto a variety of multimodal capabilities and deploymentconditions.
(MMI-G1): The multimodal specificationsMUST support authoring multimodal applications for a wide range ofmultimodal capabilities (MUST specify).
The specifications should support different combinations ofinput and output modalities,synchronization granularity,configurations anddevices. Some aspects of this requirement areelaborated in detail below. For instance, the range ofsynchronization granularity isaddressed by requirementMMI-A6.
It is advantageous that the specifications allow the applicationdeveloper to author a single version of the application, instead ofmultiple versions targeted at combinations of multimodalcapabilities.
(MMI-G2): The multimodal specificationsSHOULD support authoring multimodal applications once fordeployment on difference devices with different multimodalcapabilities (NICE to specify).
The multimodal capabilities may differ based on availablemodalities, presentation and interaction capability for eachmodality (modality-specific delivery context), synchronizationgranularity, available devices and their configurationsetc... They are to be captured in the delivery contextassociated to the multimodal system.
(MMI-G3): The multimodal specificationsMUST supportsupplementary use ofmodalities (MUST specify).
Supplementary use of modalities in multimodal applicationssignificantly improves accessibility of the applications. The usermay select the modality best used to the nature of the interactionand the context of use.
When supported by the runtime or prescribed by the author, itmay be possible for the user to combine modalities as discussed forexample in requirementMMI-I7 about compositeinput.
(MMI-G4): The multimodal specificationsMUST supportcomplementary use ofmodalities (MUST specify).
Authors of multimodal applications that rely on complementarymultimodality should pay special attention to the accessibility ofthe application, for example by ensuring accessibility in eachmodality or by providing supplementary alternatives.
(MMI-G5): The multimodalspecifications will be designed such that an author can writeapplications where thesynchronization of the variousmodalities is seamless from the user's point of view (MUSTspecify).
To elaborate, an interaction event or an external event in onemodality results in a change in another; based on thesynchronization granularitysupported by the application. Seesection 4.5 for adiscussion of synchronization granularities.
Seamlessness canencompass multiple aspects:
Limited latency in thesynchronization behavior with respect to what is expected by theuser for the particular application and multimodalcapabilities.
Predictable,non-confusing multimodal behavior
Expanding on the considerations made insection1.1, it is important to support authoring for any granularityof synchronization covered in(MMI-A6):
(MMI-G6): The multimodalspecifications MUST support authoring seamless synchronization ofvarious modalities for any anysynchronization granularity
Coordination is defined as the capability to combine multimodalinputs into composite inputs based on an interpretation algorithmthat decides what makes sense to combine based on the context.Composite inputs are further discussed insection 2.4. It is a notiondifferent from synchronization granularity described insection 4.5.
The following requirement is proposed in order to address thecombinatorial explosion of synchronization granularities that theapplication developer must author for.
(MMI-G7): The multimodalspecifications SHOULD support authoring seamless synchronization ofvarious modalities once for deployment across with a whole rangeofsynchronizationgranularityorcoordination capabilities (NICEto specify).
This requirement addresses the capability for the applicationdeveloper to write the application once for a particularsynchronization granularity or coordination capability and to havethe application able to adapt its synchronization behavior whenother levels are available.
Multimodal applications are not different from any other webapplications. It is important that the specifications be notlimited to specific languages.
(MMI-G8): The multimodalspecifications MUST support authoring multimodal applications inanyhuman language (MUSTspecify).
In particular, it must be possible to apply conventional methodsfor localization and internationalization of applications.
(MMI-G9): The multimodalspecification MUST not preclude the capability to move multimodalapplication from onehuman language to another, withouthaving to rewrite the whole application (MUST specify).
For example, it should be possible to encapsulatelanguage-specific items, separately encapsulated from thelanguage-independent description.
It is important that multimodal applications remain easy toauthor and deploy in order to allow wide adoption by the webcommunity.
(MMI-G10): The multimodalspecifications produced by the MMI working group MUST be easy toimplement and use (MUST specify).
This is a generic requirement that requires designers toconsider from the outset issues of: ease-of-authoring byapplication developers; ease-of-implementation by platformdevelopers and ease-of-use by the user. Thus it affects authoring,platform implementation and deployment.
The following requirement qualifies this further to guaranteethat the specifications will be widely deployable with existingtechnologies (e.g. standards, network and client capabilitiesetc...)
(MMI-G11): The multimodalspecifications produced by theMMI workinggroup MUST depend only on technologies that are widelyavailable during the lifetime of the working group (MUSTspecify).
For W3C specifications, wide availability is understood ashaving reached at least the stage of candidate recommendation.
Related considerations are made insection 4.1.
Multimodal applications will provide mechanisms to develop anddeploy accessible applications as discussed insection1.2.
In addition, it is important that, as for all other webapplications; the following requirement be satisfied:
(MMI-G12): The multimodalspecifications produced by theMMI working group MUST not precludeconforming to the W3C accessibility guidelines (MUST specify).
This is especially important for applications that makecomplementary use of modalities.
Early deployments of multimodal applications show that securityand privacy issues can be very critical for multimodal deployments.While addressing these issues is not directly within the scope ofthe W3C Multimodal Interaction Working Group, it is important thatthese issues be considered.
(MMI-G13): The multimodalspecifications SHOULD be aligned with the W3C work andspecifications for security and privacy (SHOULD specify).
The following sec
Other considerations and issues may exist and should becompiled.
Notions of profile anddeliverycontext have been widely introduced to characterize the thecapabilities of devices and preferences of users.
From a multimodal point of view, different types of profiles arerelevant:
These profiles are combined into the notion ofdelivery context introduced by the W3Cdevice independent activity[DI Activity]. The delivery context captures the set of attributes thatcharacterize the capabilities of the access mechanism (device ordevices) (device profile), the dynamic preferences of the user (asthey relates to interaction through this device) andconfigurations. Delivery context maydynamically change as the application progresses, as the usersituation changes (situationalization) or as the number andconfigurations of the devices change.
CC/PP is an example of formalism to describe and exchange thedelivery context[CC/PP].
Users of multimodal interactions will expect to be able to relyon these profiles to optimize the way that multimodal applicationsare presented to them.
(MMI-G14): The multimodalspecifications MUST enable optimization and adaptation ofmultimodal applications based ondelivery context or dynamic changes ofdelivery context (MUST specify).
Dynamic changes of delivery context encompass situations whereavailable devices, modalities and configurations; or usagepreferences dynamically. These changes can be involuntary orinitiated by the user, the application developer or the serviceproviders.
(MMI-G15): The multimodalspecifications MUST enable authors to specify howdelivery context and changes ofdelivery context affect the multimodal interface of a particularapplication (MUST specify).
The description of such impacts on a multimodal applicationcould be specified by the author but modified by the user, platformvendor or service provider. In particular, the author can describehow the application can be be affected or adapted to the deliverycontext but the user and service providers should be able to modifythe delivery context.Other use cases should also beconsidered.
It is expected that the author of multimodal application shouldalways be able to specify the expected flow of navigation (i.e.sequence of interaction) through the application or the algorithmto determine such a flow (e.g. in mixed initiative cases). Thisleads to the following requirement:
(MMI-G16): The multimodalspecifications MUST enable the author of an application to describethe navigation flow through the application or indicate thealgorithms to determine the navigation flow (MUSTspecify).
Numerous modalities or input types require some form ofprocessing before the nature of the input is identified. Forinstance, speech input requires speech detection and speechrecognition which requires specific data files (e.g. grammars,language models etc). Similarly handwritten input requiresrecognition.
(MMI-I1): The multimodal specificationsMUST provide a mechanism to specify and attach modality relatedinformation when authoring a multimodal application. (MUSTspecify).
This implies that authors should be able to includemodality-related information, such as the media types, processingrequirements or fallback mechanisms that a user agent will need forthe particular modality. Mechanisms should be available to makethis available to the user agent.
For example, audio input may be recognized (speech recognizer),recorded or processed by speaker recognizers, natural languageprocessing, using specific data files (e.g. grammar, languagemodel), etc. The author must be able to completely define suchprocessing steps.
(MMI-I2): The multimodalspecifications developed by the MMI working group MUST supportsequential multimodal input (MUSTspecify).
It implies that
(MMI-I3): The multimodalspecifications developed by the MMI working group MUST supportsimultaneous multimodal input (MUSTspecify).
(MMI-I4): The multimodal specificationsMUST enable the author to specify thegranularity of inputsynchronization (MUST specify).
It should be remarked, however, that the actual granularity ofinput synchronization may be decided by the user, by the runtime orby the network (delivery context) or some combination thereof.
(MMI-I5): The multimodalspecifications MUST enable the author to specify how the multimodalapplication evolves when thegranularity of inputsynchronizationis modified byexternal factors (MUST specify).
This requirement enables the application developer to specifyhow the performance of the application can degrade gracefully withchanges in the input mechanism. For instance, it should be possibleto access an application designed for event-level or field-levelsynchronization between voice (on the server side) and GUI (on theterminal) on a network that permits only session-levelsynchronization (that is, permits onlysequential multimodality).
(MMI-I6): The multimodalspecifications SHOULD enable a default input synchronizationbehavior and provide "overwrite" mechanisms (SHOULDspecify).
Therefore, it should be possible to author multimodalapplications while assuming a default synchronization behavior. Forexample,supplementary event-levelmultimodalsynchronizationgranularity.
(MMI-I7): The multimodal specificationsdeveloped by the MMI working group MUST supportcomposite multimodal input (MUSTspecify).
(MMI-I8): The multimodalspecifications SHOULD allow the author to specify how inputcombination is achieved, possibly taking into account thecoordination capabilitiesavailable in the givendeliverycontext (NICE to specify).
This can be achieved with explicit scripts that describe theinterpretation and composition algorithms. On the other hand, itmay also be left to theinteractionmanager to apply an interpretation strategy that includescomposition, for example by determining the most sensibleinterpretation given thesessioncontext and therefore determining what input combination (ifany) to select. This is addressed by the following requirement.
(MMI-I9): The multimodalspecifications SHOULD enable the author to specify the mechanismused to decide when coordinated inputs are to be combined and howthey are combined (NICE to specify).
Possible ways to address this include:
(MMI-I10): The multimodalspecifications must support the description of input to be obtainedfrom:
(MUSTspecify).
(MMI-I11): The multimodalspecifications SHOULD support other input modes,including:
(NICE tospecify).
(MMI-I12): The multimodalspecifications MUST describe how extensibility is to be achievedand how new devices or modalities can be added (MUST specify).
(MMI-I13): The multimodalspecifications MUST support the representation of the meaning of auser input (MUST specify).
(MMI-I16): The multimodalspecifications MUST enable to coordinate theinput constraints across modalities(MUST specify).
Input constraints specify, for example through grammars, howinputs are can be combined via rules or interaction managementstrategies. For example the markup language may coordinatesgrammars for modalities other than speech with speech grammars toavoid duplication of effort in authoring multimodal grammars.
Possible ways to address this could include:
These methods will be considered during the specificationwork.
When using multiple modalities or user agents, a user mayintroduce errors consciously or inadvertently. For example in avoice and GUI multimodal application, the user may say "yes"simultaneously click on "no" in the user interface. We require thatthe specifications detect such conflict.
(MMI-I17): The multimodalspecifications MUST support the detection of conflicting input fromseveral modalities (MUST specify).
It is naturally expected that the author will specify how tohandle the conflict through an explicit script or piece of code. Itis also possible that an interaction management strategy will beable to detect the possible conflict and provide a strategy orsub-dialog to resolve it.
Theinteraction manager should beable to place different input events on the timeline, in order todetermine the intent of the user.
(MMI-I18): The multimodalspecifications MUST provide mechanisms to position the input eventsrelatively to each other in time (MUST specify).
(MMI-I19): The multimodalspecifications SHOULD provide mechanisms to allow for temporalgrouping of input events (SHOULD specify).
These requirements may by satisfied by mechanisms to order ofthe input events or, when needed, relative time stamping. For someconfigurations, this may involve clock synchronization.
(MMI-O1): The multimodalspecifications developed by the MMI working group MUST supportsequential media output (MUST specify).
AsSMIL supports the sequencing of medias,the specification is expected to rely on similar mechanism. This isaddressed in more details in other requirements.
It implies that
(MMI-O2): The multimodal specificationsMUST provide the ability to synchronize different output mediaswith different granularities (MUST specify).
This covers simultaneous outputs. The granularity of outputsynchronization as provided by SMIL may range from nosynchronization at all between the medias other than the play inparallel to tightly synchronization mechanisms.
(MMI-O3): The multimodalspecifications MUST enable the author to specify the granularity ofoutput synchronization (MUST specify).
However, it should be possible that the granularity of outputmedia synchronization be decided by the user (delivery context)runtime or network.
(MMI-O4): The multimodal markup MUSTenable the author to specify how the multimodal application degradewhen the granularity of output synchronization is modified byexternal factors (MUST specify).
(MMI-O5): The multimodal specificationsSHOULD rely on a default output synchronization behavior for aparticulargranularity and it shouldprovide "overwrite" mechanisms (SHOULD specify)
(MMI-O6): The multimodal specificationsMUST support as output media:
(MUST specify).
(MMI-O7): The multimodal specificationsSHOULD support additional media outputs like:
(NICE to specify).
(MMI-O8): The multimodal specificationsMUST describe how extensibility is to be achieved and how newoutput medias can be added (MUST specify).
(MMI-O9): The multimodal specificationsMUST support the specification of which output media should beprocessed and how it should be done. The specification MUST providea mechanism that describe how this can be achieved or extended fordifferent modalities (MUST specify).
Examples of output processing may include: adaptation or stylingof presentation for particular modalities, speech synthesis of textoutput into audio output, natural language generation, etc...
(MMI-A1): Where the functionality isappropriate, and clean integration is possible, the multimodalspecifications must enable the use and integration of existingstandard language specifications including visual, aural, voice andmultimedia standards (MUST specify).
In general, it is understood that in order to satisfyMMI-G11, dependencies of the multimodalspecifications on other specifications must be carefully evaluatedif these are not yet W3C recommendations or not yet widelyadopted.
SMIL 2.0 provide multimedia synchronization mechanisms.Therefore,MMI-A1 implies:
(MMI-A1a): The multimodalspecifications MUST enable the synchronization of input and outputmedia through SMIL2.0 as control mechanism (MUST specify).
The following requirement results fromMMI-A1.
(MMI-A2): The multimodal specificationsMUST be expressible in terms of XHTML modularization (MUSTspecify).
(MMI-A3): The multimodal specificationMUST allow the separation of data model, presentation layer andapplication logic in the following ways:
(MUST specify).
This will enable the multimodal specifications to be compatiblewith XForms in environments which support XForms. This would complywithMMI-A1.
From an authoring point of view, it is important to havemechanisms (events, protocols, handlers) to detect or prescribe themodalities that are or should be available: i.e. to check thedelivery context and to adapt to the delivery context. This iscovered byMMI-G14 andMMI-G15.
(MMI-A4): There MUST be eventsassociated to changes ofdeliverycontext and mechanisms to specify how to handle these events byadapting the multimodal application (MUST specify).
(MMI-A5): There SHOULD be mechanismsavailable to define thedeliverycontext or behavior that is expected or recommended by theauthor (SHOULD specify).
(MMI-A6): The multimodal specificationsMUST support thesynchronizationgranularities at the following levels of synchronization:
(MUST specify).
In addition,
The following requirement results fromMMI-A1.
(MMI-A7a): Event-level synchronizationMUST follow theDOM event model (MUSTspecify).
(MMI-A7b): Event-level synchronizationSHOULD followXML events (SHOULDspecify).
Such events are not limited to events generated by userinteractions as discussed inMMI-A16.
It is important that the application developer be able to fullydefine the synchronization granularity.
(MMI-A8): The multimodal specificationsMUST enable the author to specify thegranularity of synchronization(MUST specify).
However:
(MMI-A9): It MUST be possible that thegranularity of synchronization be decided by the user runtime ornetwork (through thedeliverycontext) (MUST specify).
(MMI-A10): The multimodalspecifications MUST enable the author to specify how the multimodalapplication degrade when thegranularity of synchronization ismodified by external factors (MUST specify).
(MMI-A11): The multimodalspecifications should rely on an input and outputdefault synchronization behaviorand it should provide "overwrite" mechanisms (SHOULD specify).
Nothing imposes that input and output, even in a same modality,be provided in a same device or user agent. The input and outputcan be independent and the granularity of interfaces afforded bythe specification should apply independently to the mechanisms ofinput and output within a given modality when necessary.
(MMI-A12): The specification MUSTsupport separate interfaces for input and output even within a samemodality (MUST specify).
(MMI-A13): The multimodalspecifications MUST supportsynchronization of differentmodalities or devicesdistributed across the network,providing the user with the capability to interact throughdifferent devices (MUST specify).
In particular, this includes multi-device applications wheredifferent devices or user agents are used to interact with a sameapplications; these may involve presentation in the same modalitybut on different devices.
Distribution of input and output processing refers to caseswhere the processing algorithms applied on input and output may beperformed by distributed components.
(MMI-A14): The multimodalspecifications MUST support the distribution of input andoutput processing (MUSTspecify).
(MMI-A15): The multimodalspecifications MUST support the expression of some level of controlover the distributed processing of input and output processing(MUST specify).
This requirement is related toMMI-I1 andMMI-O9.
(MMI-A16): The multimodalspecifications MUST enable author to specify how multimodalapplications handle external input events and generate externaloutput events used by other processes (MUST specify).
Examples of input events include camera, sensors or GPS events.Example of output event include any form of notification or triggergenerated by the user interaction.
This is expected to be automatically satisfied if events aretreated asXML events.
RequirementsMMI-I8 andMMI-I9 generalize as follows.
(MMI-A17): The multimodalspecifications MUST provide mechanisms to position the input andoutput events relatively to each other in time (MUST specify).
(MMI-A18): The multimodalspecifications SHOULD provide mechanisms to allow for temporalgrouping of input and output events (SHOULD specify).
These requirements may by satisfied by mechanisms to order ofthe events or, when needed, relative time stamping. For someconfigurations, this may involve clock synchronization.
It is expected that users will interact with multimodalapplications through different deployment configurations (i.e.architectures): the different modules responsible for mediarendering, input capture, processing, synchronization,interpretation etc, may be partitioned or combined on a singledevice or distributed across several devices or servers. Aspreviously discussed, these configurations may dynamicallychange.
The specifications of such configuration is beyond the scope ofthe W3C Multimodal Interaction Working Group. However:
(MMI-C1): The multimodal specificationsMUST support the deployment of multimodal applications authoredaccording the W3C MMI specifications, with all the relevantdeployment configurations where functions are partitioned orcombined on a single engine or distributed across several devicesor servers (MUST specify).
The possibility to interact with multiple devices leadsnaturally to multi-user access to applications.
(MMI-C2): The multimodal specificationsSHOULD support multi-user deployments (NICE to specify).
Multimodal interactions are especially important for mobiledeployments. Therefore, the W3C multimodal working group will payattention to the constraints associated to mobile deployments andespecially cell phones.
(MMI-R1): The multimodal specificationsMUST be compatible with deployments based on user agents /renderers that run on mobile platforms (MUST specify).
Mobile platforms, like smart phones, are typically constrainedin terms of processing power and memory available. It is expectedthat the multimodal specifications will take such constraints intoaccount and be designed so that multimodal deployments are possibleon smart phones.
In addition, it is important to pay attention to the challengesintroduced by mobile networks like: limited bandwidth, delaysetc...:
(MMI-R2): The multimodal specificationsMUST support deployments over mobile networks, considering thebandwidth limitations and delays that they may introduce (MUSTspecify).
This may enable deployment techniques or specification fromother standard activity to provision the necessary quality ofservice.
The following requirements apply to the objectives for thespecification work on EMMA as defined in theglossary. EMMA is intended to support thenecessary exchanges of information between the multimodal modulesmentioned insection 5.1.
(MMI-E1): The multimodal specificationsMUST support the generation, representation and exchange of inputevents and results of input or output processing (MUST specify)
(MMI-E2): The multimodal specificationMUST support the generation, representation and exchange ofinterpretation and combinations of input event and results of inputor output processing (MUST specify).
(MMI-S1): The multimodal specificationsMUST enable to author the generation of asynchronous events andtheir handler (MUST specify).
(MMI-S2): The multimodalspecifications MUST enable to author the generation of synchronousevents and their handler (MUST specify).
(MMI-S3): The multimodalspecifications MUST support event handlers local to the eventgenerator (MUST specify).
(MMI-S4): The multimodal specificationsMUST support event handlers remote to the eventgenerator.
(MMI-S5): The multimodalspecifications MUST support the exchange of EMMA fragments as partof the synchronization events content (MUST specify).
(MMI-S6): The multimodalspecifications MUST support the specification of event handlers forexternally generated events (MUST specify).
(MMI-S7): The multimodalspecifications MUST support the specification of event handlers forexternally generated events that result from the interaction of theuser (MUST specify).
(MMI-S8): The multimodalspecifications MUST support handlers that manipulate or update thepresentation associated to a particular modality (MUSTspecify).
In distributed configurations, it is important thatsynchronization exchanges take place with minimum delays. Inpractical deployments this implies that the highest availablequality of services should be allocated to such exchanges.
(MMI-S9): The multimodalspecifications MUST enable the identification of multimodalsynchronization exchanges. (MUST specify)
This would enable the underlying network to allocate the highestquality of services associated to synchronization exchanges, if itis aware of such needs. This network behavior is beyond the scopeof the multimodal specifications.
(MMI-S10): The multimodalspecifications MUST support confirmation of event handling (MUSTspecify).
(MMI-S11): The multimodalspecifications MUST support event generation or event handlingpending confirmation of a particular event handling (MUSTspecify).
(MMI-S12a): The multimodalspecifications MUST be compatible with existing standards includingDOM events andDOMspecifications (MUST specify).
(MMI-S12b): Themultimodal specifications SHOULD be compatible with existingstandards includingXML eventsspecifications (SHOULD specify).
(MMI-S13):The multimodalspecification MUST allow lightweight multimodal synchronizationexchanges compatible with wireless network and mobile terminals(MUST specify).
This last requirement is derived fromMMI-R1 andMMI-R2.
[CC/PP]: W3C CC/PPWorking Group, URI:http://www.w3c.org/Mobile/CCPP/.
[DIactivity]: W3C Device Independent Activity, URI:http://www.w3c.org/2001/di/.
[MMIcharter]: W3C Multimodal InteractionWorking group Charter, URI:http://www.w3c.org/2002/01/multimodal-charter.html.
[MMIWG]: W3C Multimodal InteractionWorking Group, URI:http://www.w3c.org/2002/mmi/.
[MM ReqVoice]: Multimodal Requirements forVoice Markup Languages, W3C Working Draft, URI:http://www.w3c.org/TR/multimodal-reqs.
Thissection is informative.
This document was jointly prepared by the members of the W3CMultimodal Interaction Working Group.
Special acknowledgments to Jim Larson (Intel) and Emily Candell(Comverse) for their significant editorial contributions.
Analysis of use cases provides insight into the requirements forapplications likely to require a multimodal infrastructure.
The use cases described below were selected for analysis inorder to highlight different requirements resulting fromapplication variations in areas such as device requirements, eventhandling, network dependencies and methods of user interaction
Use Case Device Classification
A device with little processing power and capabilities that canbe used to capture user input (microphone, touch display, stylus,etc) as well as non-user input such as GPS. The device may have avery limited capability to interpret the input, for example a smallvocabulary speech recognition, or a character recognizer. The bulkof the processing occurs on the server including natural languageprocessing and interaction management.
An example of such a device may be a mobile phone with DSRcapabilities and a visual browser (there could actually be thinnerclients than this).
A device with powerful processing capabilities, such that mostof the processing can occur locally. Such a device is capable ofinput capture and interpretation. For example, the device can havea medium vocabulary speech recognizer, a handwriting recognizer,natural language processing and interaction managementcapabilities. The data itself may still be stored on theserver.
An example of such a device may be a recent production PDA or anin-car system.
A device capable of input capture and some degree ofinterpretation. The processing is distributed in a client/server ora multi-device architecture. For example, a medium client will havethe voice recognition capabilities to handle small vocabularycommand and control tasks but connects to a voice server for moreadvanced dialog tasks.
Use Case Summaries
Form Filling for air travel reservation
Description | Device Classification | Device Details | Execution Model |
The means for a user to reserve a flight using a wirelesspersonal mobile device and a combination of input and outputmodalities. The dialog between the user and the application isdirected through the use of a form-filling paradigm. | Thin and medium clients | touch-enabled display (i.e., supports pen input), voice input,local ASR and Distributed Speech Recognition Framework, localhandwriting recognition, voice output, TTS, GPS, wirelessconnectivity, roaming between various networks. | Client Side Execution |
User wants to make a flight reservation with his mobile devicewhile he is on the way to work. The user initiates the service viameans of making a phone call to a multimodal service (telephonemetaphor) or by selecting an application (portal environmentmetaphor). The details are not described here.
As the user moves between networks with very differentcharacteristics, the user is offered the flexibility to interactusing the preferred and most appropriate modes for the situation.For example, while sitting in a train, the use of stylus andhandwriting can achieve higher accuracy than speech (due tosurrounding noise) and protect privacy. When the user is walking,the input and output modalities that more appropriate would bevoice with some visual output. Finally, at the office the user canuse pen and voice in a synergistic way.
The dialog between the user and the application is driven by aform-filling paradigm where the user provides input to fields suchas "Travel Origin:", "Travel Destination:", "Leaving on date","Returning on date". As the user selects each field in theapplication to enter information, the corresponding inputconstraints are activated to drive the recognition andinterpretation of the user input. The capability of providingcomposite multimodal input is also examined, where input frommultiple modalities is combined for the interpretation of theuser's intent.
Driving Directions
Description | Device Classification | Device Details | Execution Model |
This application provides a mechanism for a user to request andreceive driving directions via speech and graphical input andoutput | Medium Client | on-board system (in a car) with a graphical display, mapdatabase, touch screen, voice and touch input, speech output, localASR and TTS Processing and GPS. | Client Side Execution |
User wants to go to a specific address from his current locationand while driving wants to take a detour to a local restaurant (Theuser does not know the restaurant address nor the name). The userinitiates service via a button on his steering wheel and interactswith the system via the touch screen and speech.
Name Dialing
Description | Device Classification | Device Details | Execution Model |
The means for users to call someone by saying their name. | thin and fat devices | Telephone | The study covers several possibilities:
These choices determine the kinds of events that are needed tocoordinate the device and network based services. |
Janet presses a button on her multimodal phone and says one ofthe following commands:
The application initially looks for a match in Janet's personalcontact list and if no match is found then proceeds to look inother directories. Directed dialog and tapered help are used tonarrow down the search, using aural and visual prompts. Janet isable to respond by pressing buttons, or tapping with a stylus, orby using her voice.
Once a selection has been made, rules defined by Wendy are usedto determine how the call should be handled. Janet may see apicture of Wendy along with a personalized message (aural andvisual) that Wendy has left for her. Call handling may depend onthe time of day, the location and status of the both parties, andthe relationship between them. An "ex" might be told to never callagain, while Janet might be told that Wendy will be free in half anhour after Wendy's meeting has finished. The call may beautomatically directed to Wendy's home, office or mobile phone, orJanet may be invited to leave a message.
The use-case analysis exercise helped to identify the types ofevents a multimodal system would likely need to support.
Based on the use case analysis, the following eventsclassifications were defined:
The events from the use cases described above have beenconsolidated in the following table.
Event Table:
Event Type | Asynchronous vs. Synchronous | Local vs. remote generation | Local vs. remote handling | Input inter- pretation | External vs. User | Notifications vs. actions | Comments | |
1. | Data Reply Event | Synchronous | Remote | Local | No | External | Notification | Event containing results from a previous data request |
2. | HTTP Request | Asynchronous | Local | Remote | No | External | N/A | A request sent via the HTTP Protocol |
3. | GPS_DATA_in | Synchronous | Remote | Local | No | External | Notification | Event containing GPS Location Data |
4. | Touch Screen Event | Asynchronous | Local | Local | Yes | User | Action | Event that contains coordinates corresponding to a location ona touch screen |
5. | Start_Listening Event | Asynchronous | Local / Remote | Local / Remote | No | User | Action | Event to invoke the speech recognizer |
6. | Return Reco Results | Synchronous | Local / Remote | Local | Yes | External | Notification | Event containing the results of a recognition |
7. | Alert | Asynchronous | Remote | Local | No | External | Notification | Event containing unsolicited data which may be of use to anapplication |
8. | Register User Ack | Synchronous | Remote | Local | No | External | Notification | Event acknowledging that user has registered with theservice |
9. | Call | Asynchronous | Local | Remote | No | User | Action | Request to place an outgoing call |
10. | Call Ack | Synchronous | Remote | Local | No | External | Notification | Event acknowledging request to place an outgoing call |
11. | Leave Message | Asynchronous | Local | Remote | No | User | Action | Request to leave a message |
12. | Message Ack | Synchronous | Remote | Local | No | External | Notification | Event acknowledging request to leave a message |
13. | Send Mail | Asynchronous | Local | Remote | No | User | Action | Request to send a message |
14. | Mail Ack | Synchronous | Remote | Local | No | External | Notification | Event acknowledging request to send a message |
15. | Register_Device_Profile (delivery_context) | Synchronous | Local | Remote | No | External | Notification | Occurs on connection |
16. | Update_Device_Profile (delivery_context) | Asynchronous/ Synchronous | Local | Remote | No | External/ User | Notifiication | The user selects a new set of modalities by pressing a buttonor making menu selections (synchronous event). If the devicecan detect changes in the network or location via GPS or beacons,then the event is asynchronous. |
17. | On_Focus (field_name) | Synchronous | Local | Remote | No | User | Action | Event sends selected field to multimodal synchronization serverfor the purpose of loading the appropriate input constraints forthe field. |
18. | Handwriting_Reco () | Synchronous | Local | Local | Yes | User | Action | Event to invoke the handwriting recognizer (HWR) after peninput in a field. In the current scenario, we consider that HWR ishandled locally, but this may be expanded later to include remoteprocessing. |
19. | Submit_Partial_Result () | Synchronous | Local | Remote | No | External | Notification | Result of recognition of field input is sent to the server |
20. | Send_Ink (ink_data, time_stamp) | Synchronous | Local | Remote | Yes | User | Action | Ink collected for a pen gesture is sent to multimodal serverfor integration. As before, this event associates time stampinformation with the ink data for synchronization.The result of thepen gesture can be transmitted as a sequence of (x,y) coordinatesrelative to the device display, |
21 | Collect_Pen_Input () | Synchronous | Local | Local | Yes | User | Action | Ink collection could be interpreted firstlocally into basic shapes (i.e, circles, lines) and have thosetransmitted to the server. |
22 | Send_Gesture (gesture_data, time_stamp) | Synchronous | Local | Remote | Yes | User | Action | The server can provide a deeper semanticinterpetation than the basic shapes that are recognized onthe client |
Combination of video and audio to process input (jointface/lips/movement recognition and speech recognition) and generateoutput (audio-visual media)
complementary use ofmodalities
A use of modalities where the interactions available to the userdiffer per modality.
Composite input is input received on multiplemodalities at the same time and treated as a single, integratedcompound input by downstream processes.
Contradictory inputs provided by the user indifferent modalities or on different devices. For examples, theymay indicate different exclusive selection.
A session context consists of the history ofthe interaction between the user and the multimodal system,including the input received from the user, the output presented tothe user, the current data model and the sequence of data modelchanges.
Capability of a multimodal system to combinemultimodal inputs into composite inputs based on an interpretationalgorithm that decides what makes sense to combine based on thecontext
CC/PP [ Composite Capability/PreferenceProfiles],
A W3C working group which is developing anRDF-based framework for the management of device profileinformation. For more details about the group activity please visithttp://www.w3.org/Mobile/CCPP/
concatenation
The text-to-speech engine concatenates shortdigital-audio segments and performs intersegment smoothing toproduce a continuous sound.
CSS
Cascading Stylesheets
data file
Argument files to input or output processing algorithms
Synchronization behavior supported by defaultby a multimodal application.
A set of attributes that characterizes thecapabilities of the access mechanism in terms of device profile,user profile (e.g. identify, preferences and usage patterns) andsituation. Delivery context may have static and dynamiccomponents.
A piece of hardware used to access and interactwith an application.
A particular subset of the delivery contextthat describes the device characteristics including for exampledevice form factor, available modalities, level of synchronizationand coordination.
DI [Device Independence]
The W3C Device Independence Activity is workingto ensure seamless Web access with all kinds of devices, andworldwide standards for the benefit of Web users and contentproviders alike. For more details pleases refer tohttp://www.w3.org/2001/di/
Stored or recognized handwriting input.
A dialog in which one party (the user or the computer) follows apre-selected path, independent of the responses of the other. (cfr.mixed initiative dialog).
System components may live at various points ofthe network, including the local client.
A standard interface to the contents of a webpage. Please visithttp://www.w3.org/DOM/ for moredetails.
EMMA
Extensible MultiModal Annotation MarkupLanguage. Formerly known as NLSML—Natural LanguageSemantics Markup Language. This markup language is intended for useby systems to represent semantic interpretations for a variety ofinputs, including but not necessarily limited to, speech andnatural language text input
An event is a representation of someasynchronous occurrence of interest to the multimodal system.Examples include mouse clicks, hanging up the phone, speechrecognition errors. Events may be associated with data e.g. thelocation the mouse was clicked.
A software object intended to interpret andrespond to a given class of events.
An agent (human or software) capable ofgenerating events.
Runtime configuration of the various systemcomponents in a particular manifestation of a multimodalsystem.
External input events are events that are notoriginating from direct user input. External output events areevents that originate in the multimodal system and are handled byother processes.
GPS [GlobalPositioning System]
A worldwide radio-navigation system formed froma constellation of 24 satellites and their ground stations. GPSuses these "man-made stars" as reference points to calculatepositions accurate to a matter of meters.
grammar
A computational mechanism that defines a finiteor infinite set of legal strings, usually with some structure.
use of the pen for input which is convertedinto text or symbols. Involves handwriting recognition.
Portions of profile and session contextpersisted for a same user across sessions.
HTML [HyperText MarkupLanguage]
A simple markup language used to createhypertext documents that are portable from one platform to another.To find more information about specification of HTML and theworking group acitivity please visithttp://www.w3c.org/MarkUp/
HTTP [Hypertext TransferProtocol]
To get details about the HTTP working group andthe HTTP specification please visithttp://www.w3c.org/Protocols/.
Any spoken language (e.g. French, Japanese,English etc...).
ink
See digital ink.
Event, set of events or macro-event generatedby a user interaction in a particular modality on a particulardevice.
Specify how inputs are can be combined via rules or interactionmanagement strategies. For example the markup language maycoordinates grammars for modalities other than speech with speechgrammars to avoid duplication of effort in authoring multimodalgrammars.
Algorithm to apply to a particular input inorder to transform or extract information from it (e.g. filtering,speech recognition; spaker recognition, NL parsing,...). Thealgorithm may rely on data files as argument (e.g. grammar,acoustic model, NL models, ...)
An interaction manager generates or updates thepresentation by processing user inputs, session context andpossibly other external knowledge sources to determine the intentof the user. An interaction manager relies on strategies todetermine focus and intent as well as to disambiguate, correct andconfirm sub-dialogs. We typically distinguishdirected dialogs (e.g. user-driven orapplication-driven) andmixedinitiative or free flow dialogs.
Output media where at least a face has lipmovements synchronized with an output audio speech
markup components
XML vocabularies that provide markup-levelaccess to various system components
Synchronization between output media asspecified by SMIL:http://www.w3.org/AudioVideo/
medium
It is a description that can be rendered intophysical effects that can be perceived and interacted with by theuser in one or multiple modalities and on one or multipledevices
MIDI
Musical Instrument Digital Interface, an audioformat.
A style of dialog where both parties (the computer and the user)can control what is talked about and when. A party may on its ownchange the course of the interaction (e.g., by asking questions,providing more or less information than what was requested ormaking digressions). Mixed initiative dialog is contrasted withdirected dialog where only one party controls the conversation. (cfdirected dialog)
MMI: [MultimodalInteraction]
A W3C Working Group which is developing markupspecifications that extends the Web user interface to allowmultiple modes of interaction. For more details of MMI workinggroup and MMI activity, please visithttp://www.w3c.org/2002/mmi/
The type of communication channel used for interaction. It alsocovers the way an idea is expressed or perceived, or the manner inwhich an action is performed.
Change of modality to perform a particularinteraction. It can be decided by the user or imposed by theapplication or runtime (e.g. when a phone call drops).
MPEG
Working group established under the jointdirection of the International Standards Organization/InternationalElectrotechnical Commission (ISO/IEC), that has for goal to createstandards for the digital video and the audiophonic compression.More precisely, MPEG defines the syntax of audio and video formatneeding low data rates, as well as operations to be undertaken bydecoders.
MP3 [MPEG Audio Layer-3]
An Internet music format. For MP3 relatedtechnologies please refer tohttp://www.mp3-tech.org/
A multimodal system supports communication withthe user through different modalities such as voice, gesture, andtyping. (cfr modality)
must specify
A must specify requirement must be satisfied bythe multimodal specification(s), starting from their very firstversion.
natural Language (NL)
Term used for human language, as opposed toartificial languages (such as computer programming languages orthose based on mathematical logic). A processor capable of handlingNL must typically be able to deal with a flexible set ofsentences.
natural languagegeneration (NLG)
A technique for generating natural languagesentences based on some higher-level information. Generation bytemplate is an example of simple language generation techniques.The flight from <departure-city> to <arrival-city>leaves at <departure-time> is an example of template wherethe slots indicated by <Â…> have to be filledwith the appropriate information by a higher-level process.
natural languageprocessing
Natural language understanding, generation,translation and other transformations on human language.
natural language understanding(NLU)
The process of interpreting natural languagephrases to specify their meaning, typically as a formula in formallogic.
nice to specify
A "nice to specify" requirement will be takeninto account when designing the specification. If a technicalsolution is available, the specifications will try to satisfy therequirement or support the feature, provided that it does notexcessively delay the work plan.
The act of communicating an event (seesubscribe).
override mechanism forsynchronization
Information that specifies how thesynchronization should behave when not following its defaultbehavior. (cf. default synchronization)
output generation
Expressing information to be conveyed in auser-friendly form, possibly using multiple output mediastreams.
Algorithm to apply in order to transform orgenerate an output (e.g. TTS, NLG)
semantics
The meaning or interpretation of a word,phrase, or sentence, as opposed to its syntactic form. In naturallanguage and dialog technology the term semantics is typically usedto indicate a representation of a phrase or a sentence whoseelements can be related to entities of the application (e.g.departure airport and arrival time for a flight application), ordialog acts (e.g. request for help, repeat, etc.).
The process of interpreting the semantic partof a grammar. The result of the interpretation is a semanticrepresentation. This process is often referred as SemanticTagging.
The semantic result of parsing a writtensentence, or a spoken utterance. The semantic interpretation can beexpressed as attribute value pairs or more complex structures. W3Cis working on the definition of Semantic Representationformalism
A sequential input is one received on a singlemodality. The modality may change over time.] (cf.simultaneous orcomposite input.
A sequential multimodal application is one inwhich the user may interact with the application only one modalityat a time,switching betweenmodalities as needed.]
The time interval during which an applicationand its context context is associated to a user and persisted.Within a session, users may suspend and resume interaction with anapplication within a same modality or device or switch modality ordevice.
session level synchronizationgranularity
Multimodal application that supports suspendand resume behavior across modalities
should specify
The specifications (multimodal markup languageand other) will aim at addressing and satisfying the requirement orsupporting the features during the lifetime of the working group.Early specification will take this into account to allow easy andinteroperable updates.
Simultaneous inputs denote inputs that can comefrom different modalities but are not combined into compositeinputs. Simultaneous multimodal inputs, imply that the inputs fromseveral modalities are interpreted one after the other in the orderthat they where received instead of being combined beforeinterpretation.
External information that can affect the usageor expected behavior of multimodal applications including forexample on-going activities (e.g. walking versus driving),environment (e.g. noisy), privacy (e.g. alone versus in public),etc...
SMIL[Synchronized Multimedia Integration Language]
A W3C Recommendation, SMIL 2.0 enables simpleauthoring of interactive audiovisual applications. Seehttp://www.w3.org/TR/smil20/for details.
speech recognition
The ability of a computer to understand thespoken word for the purpose of receiving command and data inputfrom the speaker.
A software/hardware component that performsrecognition from a digital-audio stream. speech recognition enginesare supplied by vendors who specialize in the software.
The act of informing an event source that youwant to be notified of some class of events.
supplementary use ofmodalities
Describes multimodal applications in whichevery interaction (input or output) can be carried through in eachmodality as if it was the only available modality
Suspend and resume behavior; an applicationsuspended in one modality can be resumed in the same or anothermodality
Way that an input in one modality is reflectedin the output in another modality/device as well as way that it maybe combined across modalities (coordination capability)
synchronization granularity orlevel
The text-to-speech engine synthesizes theglottal pulse from human vocal cords and applies various filters tosimulate throat length, mouth cavity, lip shape, and tongueposition.
Technologies for converting textual (ASCII)information into synthetic speech output. Used in voice-processingapplications requiring production of broad, unrelated, andunpredictable vocabularies, such as products in a catalog or namesand addresses. This technology is appropriate when system designconstraints prevent the more efficient use of speech concatenationalone.
Annotation of an event that characterize therelative (with respect to an agreed upon reference) or absolutetime of occurrence of the event
TTS
text-to-speech
turn
Set of input collected from the user beforeupdating the output
URI
Uniform Resource Identifier -http://www.w3.org/Addressing/
A particular subset of the delivery contextthat describes the user including for example the identity,personal information, personal preferences and usagepreferences.
An XML Events module that provides XMLlanguages with the ability to uniformly integrate event listenersand associated event handlers with DOM Level 2 event interfaces.The result is to provide an interoperable way of associatingbehaviors with document-level markup. For XML Event specificationplease visithttp://www.w3.org/TR/2001/WD-xml-events-20011026/Overview.html#s_intro
XSL
Extensible Stylesheet Language
XSLT
Extensible Stylesheet LanguageTransformations