Movatterモバイル変換

Voice Extensible Markup Language(VoiceXML) Version 2.0

W3C Recommendation 16 March 2004

This Version:: http://www.w3.org/TR/2004/REC-voicexml20-20040316/
Latest Version:: http://www.w3.org/TR/voicexml20/
Previous Version:: http://www.w3.org/TR/2004/PR-voicexml20-20040203/
Editors:: Scott McGlashan, Hewlett-Packard (Editor-in-Chief) Daniel C. Burnett, Nuance Communications
Jerry Carter, Invited Expert
Peter Danielsen, Lucent (until October 2002)
Jim Ferrans, Motorola
Andrew Hunt, ScanSoft
Bruce Lucas, IBM
Brad Porter, Tellme Networks
Ken Rehor, Vocalocity
Steph Tryphonas, Tellme Networks

Please refer to theerratafor this document, which may include some normative corrections.

Abstract

This document specifies VoiceXML, the Voice Extensible MarkupLanguage. VoiceXML is designed for creating audio dialogs thatfeature synthesized speech, digitized audio, recognition ofspoken and DTMF key input, recording of spoken input, telephony,and mixed initiative conversations. Its major goal is to bringthe advantages of Web-based development and content delivery tointeractive voice response applications.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in theW3C technical reports index at http://www.w3.org/TR/.

This document has been reviewed by W3C Members and otherinterested parties, and it has been endorsed by the Directoras aW3CRecommendation. W3C's role in making the Recommendation is todraw attention to the specification and to promote its widespreaddeployment. This enhances the functionaility and interoperabilityof the Web.

This specification is part of the W3C Speech Interface Frameworkand has been developed within theW3C Voice Browser Activity by participants intheVoice Browser WorkingGroup (W3CMembers only).

The design of VoiceXML 2.0 has been widely reviewed (see thedisposition of comments) and satisfies the Working Group'stechnical requirements.A list of implementations is included in theVoiceXML 2.0 implementation report, along with the associated test suite.

Comments are welcome onwww-voice@w3.org (archive).SeeW3C mailing list and archive usageguidelines.

The W3C maintains a list ofany patentdisclosures related to this work.

Conventions of this Document

In this document, the key words "must", "must not","required", "shall", "shall not", "should", "should not","recommended", "may", and "optional" are to be interpreted asdescribed in[RFC2119]and indicate requirement levels for compliant VoiceXMLimplementations.

Abbreviated Contents

Full Contents

1.Overview
- 1.1Introduction
- 1.2Background
  - 1.2.1Architectural Model
  - 1.2.2Goalsof VoiceXML
  - 1.2.3Scopeof VoiceXML
  - 1.2.4Principles of Design
  - 1.2.5Implementation Platform Requirements
- 1.3Concepts
  - 1.3.1Dialogs and Subdialogs
  - 1.3.2Sessions
  - 1.3.3Applications
  - 1.3.4Grammars
  - 1.3.5Events
  - 1.3.6Links
- 1.4VoiceXMLElements
- 1.5DocumentStructure and Execution
  - 1.5.1Execution within one Document
  - 1.5.2Executing a Multi-Document Application
  - 1.5.3Subdialogs
  - 1.5.4FinalProcessing
2.DialogConstructs
- 2.1Forms
  - 2.1.1FormInterpretation
  - 2.1.2FormItems
  - 2.1.3FormItem Variables and Conditions
  - 2.1.4Directed Forms
  - 2.1.5MixedInitiative Forms
  - 2.1.6FormInterpretation Algorithm
- 2.2Menus
  - 2.2.1menu element
  - 2.2.2choice element
  - 2.2.3DTMFin Menus
  - 2.2.4enumerate element
  - 2.2.5Grammar Generation
  - 2.2.6Interpretation Model
- 2.3FormItems
  - 2.3.1field element
  - 2.3.2block element
  - 2.3.3initial element
  - 2.3.4subdialog element
  - 2.3.5object element
  - 2.3.6record element
  - 2.3.7transfer element
- 2.4Filled
- 2.5Links
3.UserInput
- 3.1Grammars
  - 3.1.1SpeechGrammars
  - 3.1.2DTMFGrammars
  - 3.1.3Scopeof Grammars
  - 3.1.4Activation of Grammars
  - 3.1.5Semantic Interpretation of Input
  - 3.1.6Mapping Semantic Interpretation Results to VoiceXMLforms
4.SystemOutput
- 4.1Prompt
  - 4.1.1SpeechMarkup
  - 4.1.2BasicPrompts
  - 4.1.3AudioPrompting
  - 4.1.4<value> Element
  - 4.1.5Bargein
  - 4.1.6PromptSelection
  - 4.1.7Timeout
  - 4.1.8PromptQueueing and Input Collection
5.Control flowand scripting
- 5.1Variablesand Expressions
  - 5.1.1Declaring Variables
  - 5.1.2Variable Scopes
  - 5.1.3Referencing Variables
  - 5.1.4Standard Session Variables
  - 5.1.5Standard Application Variables
- 5.2EventHandling
  - 5.2.1throw element
  - 5.2.2catch element
  - 5.2.3Shorthand Notation
  - 5.2.4catchElement Selection
  - 5.2.5Default catch elements
  - 5.2.6EventTypes
- 5.3ExecutableContent
  - 5.3.1var element
  - 5.3.2assign element
  - 5.3.3clear element
  - 5.3.4if,elseif, else elements
  - 5.3.5prompts
  - 5.3.6reprompt element
  - 5.3.7goto element
  - 5.3.8submit element
  - 5.3.9exit element
  - 5.3.10return element
  - 5.3.11disconnect element
  - 5.3.12script element
  - 5.3.13log element
6.Environmentand Resources
- 6.1ResourceFetching
  - 6.1.1Fetching
  - 6.1.2Caching
  - 6.1.3Prefetching
  - 6.1.4Protocols
- 6.2MetadataInformation
  - 6.2.1meta element
  - 6.2.2metadata element
- 6.3property element
  - 6.3.1Platform-Specific Properties
  - 6.3.2Generic Speech Recognizer Properties
  - 6.3.3Generic DTMF Recognizer Properties
  - 6.3.4Promptand Collect Properties
  - 6.3.5Fetching Properties
  - 6.3.6Miscellaneous Properties
- 6.4param element
- 6.5Value Designations
Appendices
- Appendix A.Glossary of Terms
- Appendix B.VoiceXML Document Type Definition
- Appendix C.Form Interpretation Algorithm
- Appendix D.Timing Properties
- Appendix E.Audio File Formats
- Appendix F.Conformance
- Appendix G.Internationalization
- Appendix H.Accessibility
- Appendix I.Privacy
- Appendix J.Changes from VoiceXML 1.0
- Appendix K.Reusability
- Appendix L.Acknowledgements
- Appendix M.References
- Appendix N.Media Type and File Suffix
- Appendix O.VoiceXML XML Schema Definition
- Appendix P.Builtin Grammar Types

1.Overview

This document defines VoiceXML, the Voice Extensible MarkupLanguage. Its background, basic concepts and use are presented inSection 1. The dialogconstructs of form, menu and link, and the mechanism (FormInterpretation Algorithm) by which they are interpreted are thenintroduced inSection 2. Userinput using DTMF and speech grammars is covered inSection 3, whileSection 4 covers system output using speechsynthesis and recorded audio. Mechanisms for manipulating dialogcontrol flow, including variables, events, and executableelements, are explained inSection5. Environment features such as parameters and properties aswell as resource handling are specified inSection 6. The appendices provide additionalinformation including theVoiceXML Schema, a detailed specification of theForm Interpretation Algorithmandtiming,audio file formats, andstatements relating toconformance,internationalization,accessibility andprivacy.

The origins of VoiceXML began in 1995 as an XML-based dialogdesign language intended to simplify the speech recognitionapplication development process within an AT&T project calledPhone Markup Language (PML). As AT&T reorganized, teams atAT&T, Lucent and Motorola continued working on their ownPML-like languages.

In 1998, W3C hosted a conference on voice browsers. By thistime, AT&T and Lucent had different variants of theiroriginal PML, while Motorola had developed VoxML, and IBM wasdeveloping its own SpeechML. Many other attendees at theconference were also developing similar languages for dialogdesign; for example, such as HP's TalkML and PipeBeach'sVoiceHTML.

The VoiceXML Forum was then formed by AT&T, IBM, Lucent,and Motorola to pool their efforts. The mission of the VoiceXMLForum was to define a standard dialog design language thatdevelopers could use to build conversational applications. Theychose XML as the basis for this effort because it was clear tothem that this was the direction technology was going.

In 2000, the VoiceXML Forum released VoiceXML 1.0 to thepublic. Shortly thereafter, VoiceXML 1.0 was submitted to the W3Cas the basis for the creation of a new international standard.VoiceXML 2.0 is the result of this work based on input from W3CMember companies, other W3C Working Groups, and the public.

Developers familiar with VoiceXML 1.0 are particularly directedtoChanges from PreviousPublic Version which summarizes how VoiceXML 2.0 differs fromVoiceXML 1.0.

1.1Introduction

VoiceXML is designed for creating audio dialogs that featuresynthesized speech, digitized audio, recognition of spoken andDTMF key input, recording of spoken input, telephony, andmixed initiative conversations. Its major goal is to bring theadvantages of Web-based development and content delivery tointeractive voice response applications.

Here are two short examples of VoiceXML. The first is thevenerable "Hello World":

<?xml version="1.0" encoding="UTF-8"?> <vxml xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"   version="2.0">  <form>    <block>Hello World!</block>  </form></vxml>

The top-level element is <vxml>, which is mainly acontainer fordialogs. There are two types of dialogs:forms andmenus. Forms present information andgather input; menus offer choices of what to do next. Thisexample has a single form, which contains a block thatsynthesizes and presents "Hello World!" to the user. Since theform does not specify a successor dialog, the conversationends.

Our second example asks the user for a choice of drink andthen submits it to a server script:

<?xml version="1.0" encoding="UTF-8"?><vxml xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"   version="2.0">  <form>  <field name="drink">     <prompt>Would you like coffee, tea, milk, or nothing?</prompt>     <grammar src="drink.grxml" type="application/srgs+xml"/>  </field>  <block>     <submit next="http://www.drink.example.com/drink2.asp"/>  </block> </form></vxml>

Afield is an input field. The user must provide avalue for the field before proceeding to the next element in theform. A sample interaction is:

C(computer): Would you likecoffee, tea, milk, or nothing?
H(human): Orange juice.
C: I did not understand what you said.(aplatform-specific default message.)
C: Would you like coffee, tea, milk, or nothing?
H: Tea
C:(continues in documentdrink2.asp)

1.2Background

This section contains a high-level architectural model, whoseterminology is then used to describe the goals of VoiceXML, itsscope, its design principles, and the requirements it places onthe systems that support it.

1.2.1Architectural Model

The architectural model assumed by this document has thefollowing components:

VoiceXML interpreter fits between document server and implementation platform
Figure 1: Architectural Model

Adocument server (e.g. a Web server) processesrequests from a client application,the VoiceXMLInterpreter, through theVoiceXML interpreter context.The server producesVoiceXML documents in reply, which areprocessed by the VoiceXML interpreter. The VoiceXML interpretercontext may monitor user inputs in parallel with the VoiceXMLinterpreter. For example, one VoiceXML interpreter context mayalways listen for a special escape phrase that takes the user toa high-level personal assistant, and another may listen forescape phrases that alter user preferences like volume ortext-to-speech characteristics.

Theimplementation platform is controlled by theVoiceXML interpreter context and by the VoiceXML interpreter. Forinstance, in an interactive voice response application, theVoiceXML interpreter context may be responsible for detecting anincoming call, acquiring the initial VoiceXML document,and answering the call, while the VoiceXML interpreter conductsthe dialog after answer. The implementation platform generatesevents in response to user actions (e.g. spoken or characterinput received, disconnect) and system events (e.g. timerexpiration). Some of these events are acted upon by the VoiceXMLinterpreter itself, as specified by the VoiceXML document, whileothers are acted upon by the VoiceXML interpreter context.

1.2.2 Goals of VoiceXML

VoiceXML's main goal is to bring the full power of Webdevelopment and content delivery to voice response applications,and to free the authors of such applications from low-levelprogramming and resource management. It enables integration ofvoice services with data services using the familiarclient-server paradigm. A voice service is viewed as a sequenceof interaction dialogs between a user and an implementationplatform. The dialogs are provided by document servers, which maybe external to the implementation platform. Document serversmaintain overall service logic, perform database and legacysystem operations, and produce dialogs. A VoiceXML documentspecifies each interaction dialog to be conducted by a VoiceXMLinterpreter. User input affects dialog interpretation and iscollected into requests submitted to a document server. Thedocument server replies with another VoiceXML document tocontinue the user's session with other dialogs.

VoiceXML is a markup language that:

Minimizes client/server interactions by specifying multipleinteractions per document.
Shields application authors from low-level, andplatform-specific details.
Separates user interaction code (in VoiceXML) from servicelogic (e.g. CGI scripts).
Promotes service portability across implementation platforms.VoiceXML is a common language for content providers, toolproviders, and platform providers.
Is easy to use for simple interactions, and yet provideslanguage features to support complex dialogs.

While VoiceXML strives to accommodate the requirements of amajority of voice response services, services with stringentrequirements may best be served by dedicated applications thatemploy a finer level of control.

1.2.3 Scope of VoiceXML

The language describes the human-machine interaction providedby voice response systems, which includes:

Output of synthesized speech (text-to-speech).
Output of audio files.
Recognition of spoken input.
Recognition of DTMF input.
Recording of spoken input.
Control of dialog flow.
Telephony features such as call transfer and disconnect.

The language provides means for collecting character and/orspoken input, assigning the input results to document-definedrequest variables, and making decisions that affect theinterpretation of documents written in the language. A documentmay be linked to other documents through Universal ResourceIdentifiers (URIs).

1.2.4 Principles of Design

VoiceXML is an XML application[XML].

The language promotes portability of services throughabstraction of platform resources.
The language accommodates platform diversity in supportedaudio file formats, speech grammar formats, and URI schemes.While producers of platforms may support various grammar formatsthe language requires a common grammar format, namely the XMLForm of the W3C Speech Recognition Grammar Specification[SRGS], to facilitateinteroperability. Similarly, while various audio formats forplayback and recording may be supported, the audio formatsdescribed inAppendixE must be supported
The language supports ease of authoring for common types ofinteractions.
The language has well-defined semantics that preserves theauthor's intent regarding the behavior of interactions with theuser. Client heuristics are not required to determine documentelement interpretation.
The language recognizes semantic interpretations from grammarsand makes this information available to the application.
The language has a control flow mechanism.
The language enables a separation of service logic frominteraction behavior.
It is not intended for intensive computation, databaseoperations, or legacy system operations. These are assumed to behandled by resources outside the document interpreter, e.g. adocument server.
General service logic, state management, dialog generation,and dialog sequencing are assumed to reside outside the documentinterpreter.
The language provides ways to link documents using URIs, andalso to submit data to server scripts using URIs.
VoiceXML provides ways to identify exactly which data tosubmit to the server, and which HTTP method (GET or POST) to usein the submittal.
The language does not require document authors to explicitlyallocate and deallocate dialog resources, or deal withconcurrency. Resource allocation and concurrent threads ofcontrol are to be handled by the implementation platform.

1.2.5 Implementation Platform Requirements

This section outlines the requirements on thehardware/software platforms that will support a VoiceXMLinterpreter.

Document acquisition. The interpreter context isexpected to acquire documents for the VoiceXML interpreter to acton. The "http" URI scheme must be supported. In some cases, thedocument request is generated by the interpretation of a VoiceXMLdocument, while other requests are generated by the interpretercontext in response to events outside the scope of the language,for example an incoming phone call. When issuing documentrequests via http, the interpreter context identifies itselfusing the "User-Agent" header variable with the value"<name>/<version>", for example,"acme-browser/1.2"

Audio output. An implementation platform must supportaudio output using audio files and text-to-speech (TTS). Theplatform must be able to freely sequence TTS and audio output. Ifan audio output resource is not available, an error.noresourceevent must be thrown. Audio files are referred to by a URI. Thelanguage specifies a required set of audio file formats whichmust be supported (seeAppendix E); additional audio file formats mayalso be supported.

Audio input. An implementation platform is required todetect and report character and/or spoken input simultaneouslyand to control input detection interval duration with a timerwhose length is specified by a VoiceXML document. If an audioinput resource is not available, an error.noresource event mustbe thrown.

It must reportcharacters (for example, DTMF) enteredby a user. Platforms must support the XML form of DTMF grammarsdescribed in the W3C Speech Recognition Grammar Specification[SRGS]. They should alsosupport the Augmented BNF (ABNF) form of DTMF grammars describedin the W3C Speech Recognition Grammar Specification[SRGS].
It must be able to receivespeech recognition grammardata dynamically. It must be able to use speech grammar data inthe XML Form of the W3C Speech Recognition Grammar Specification[SRGS]. It should be able toreceive speech recognition grammar data in the ABNF form of theW3C Speech Recognition Grammar Specification[SRGS], and may support other formats such asthe JSpeech Grammar Format[JSGF] or proprietary formats. Some VoiceXMLelements contain speech grammar data; others refer to speechgrammar data through a URI. The speech recognizer must be able toaccommodate dynamic update of the spoken input for which it islistening through either method of speech grammar dataspecification.
It must be able torecord audio received from the user.The implementation platform must be able to make the recordingavailable to arequest variable. The language specifies arequired set of recorded audio file formats which must besupported (seeAppendixE); additional formats may also be supported.

Transfer The platform should be able to support makinga third party connection through a communications network, suchas the telephone.

1.3Concepts

A VoiceXMLdocument (or a set of related documentscalled anapplication) forms a conversational finite statemachine. The user is always in one conversational state, ordialog, at a time. Each dialog determines the next dialogto transition to.Transitions are specified using URIs,which define the next document and dialog to use. If a URI doesnot refer to a document, the current document is assumed. If itdoes not refer to a dialog, the first dialog in the document isassumed. Execution is terminated when a dialog does not specifya successor, or if it has an element that explicitly exits theconversation.

1.3.1 Dialogs and Subdialogs

There are two kinds of dialogs:forms andmenus.Forms define an interaction that collects values for a set ofform item variables. Each field may specify a grammar thatdefines the allowable inputs for that field. If a form-levelgrammar is present, it can be used to fill several fields fromone utterance. A menu presents the user with a choice of optionsand then transitions to another dialog based on that choice.

Asubdialog is like a function call, in that itprovides a mechanism for invoking a new interaction, andreturning to the original form. Variable instances, grammars, andstate information are saved and are available upon returning tothe calling document. Subdialogs can be used, for example, tocreate a confirmation sequence that may require a database query;to create a set of components that may be shared among documentsin a single application; or to create a reusable library ofdialogs shared among many applications.

1.3.2 Sessions

Asession begins when the user starts to interact witha VoiceXML interpreter context, continues as documents are loadedand processed, and ends when requested by the user, a document,or the interpreter context.

1.3.3 Applications

Anapplication is a set of documents sharing the sameapplication root document. Whenever the user interactswith a document in an application, its application root documentis also loaded. The application root document remains loadedwhile the user is transitioning between other documents in thesame application, and it is unloaded when the user transitions toa document that is not in the application. While it is loaded,the application root document's variables are available to theother documents as application variables, and its grammars remainactive for the duration of the application, subject to the grammaractivation rules discussed inSection 3.1.4.

Figure 2 shows the transition of documents (D) in anapplication that share a common application root document(root).

root over sequence of 3 documents
Figure 2: Transitioning between documents in an application.

1.3.4 Grammars

Each dialog has one or more speech and/or DTMFgrammarsassociated with it. Inmachine directed applications, eachdialog's grammars are active only when the user is in thatdialog. Inmixed initiative applications, where the userand the machine alternate in determining what to do next, some ofthe dialogs are flagged to make their grammarsactive(i.e., listened for) even when the user is in another dialog inthe same document, or on another loaded document in the sameapplication. In this situation, if the user says somethingmatching another dialog's active grammars, executiontransitions to that other dialog, with the user's utterancetreated as if it were said in that dialog. Mixed initiative addsflexibility and power to voice applications.

1.3.5 Events

VoiceXML provides a form-filling mechanism for handling"normal" user input. In addition, VoiceXML defines a mechanismfor handling events not covered by the form mechanism.

Events are thrown by the platform under a variety ofcircumstances, such as when the user does not respond, doesn'trespond intelligibly, requests help, etc. The interpreter alsothrows events if it finds a semantic error in a VoiceXMLdocument. Events are caught by catch elements or their syntacticshorthand. Each element in which an event can occur may specifycatch elements. Furthermore, catch elements are also inheritedfrom enclosing elements "as if by copy". In this way, commonevent handling behavior can be specified at any level, and itapplies to all lower levels.

1.3.6 Links

Alink supports mixed initiative. It specifies agrammar that is active whenever the user is in the scope of thelink. If user input matches the link's grammar, controltransfers to the link's destination URI. A link can be usedto throw an event or go to a destination URI.

1.4VoiceXML Elements

Table 1: VoiceXML Elements
Element	Purpose	Section
<assign>	Assign a variable a value	5.3.2
<audio>	Play an audio clip within aprompt	4.1.3
<block>	A container of (non-interactive)executable code	2.3.2
<catch>	Catch an event	5.2.2
<choice>	Define a menu item	2.2.2
<clear>	Clear one or more form itemvariables	5.3.3
<disconnect>	Disconnect a session	5.3.11
<else>	Used in <if> elements	5.3.4
<elseif>	Used in <if> elements	5.3.4
<enumerate>	Shorthand for enumerating the choicesin a menu	2.2.4
<error>	Catch an error event	5.2.3
<exit>	Exit a session	5.3.9
<field>	Declares an input field in aform	2.3.1
<filled>	An action executed when fields arefilled	2.4
<form>	A dialog for presenting informationand collecting data	2.1
<goto>	Go to another dialog in the same ordifferent document	5.3.7
<grammar>	Specify a speech recognition or DTMFgrammar	3.1
<help>	Catch a help event	5.2.3
<if>	Simple conditional logic	5.3.4
<initial>	Declares initial logic upon entryinto a (mixed initiative) form	2.3.3
<link>	Specify a transition common to alldialogs in the link's scope	2.5
<log>	Generate a debug message	5.3.13
<menu>	A dialog for choosing amongstalternative destinations	2.2.1
<meta>	Define a metadata item as aname/value pair	6.2.1
<metadata>	Define metadata information using ametadata schema	6.2.2
<noinput>	Catch a noinput event	5.2.3
<nomatch>	Catch a nomatch event	5.2.3
<object>	Interact with a custom extension	2.3.5
<option>	Specify an option in a<field>	2.3.1.3
<param>	Parameter in <object> or<subdialog>	6.4
<prompt>	Queue speech synthesis and audiooutput to the user	4.1
<property>	Control implementation platformsettings.	6.3
<record>	Record an audio sample	2.3.6
<reprompt>	Play a field prompt when a field isre-visited after an event	5.3.6
<return>	Return from a subdialog.	5.3.10
<script>	Specify a block of ECMAScriptclient-side scripting logic	5.3.12
<subdialog>	Invoke another dialog as a subdialogof the current one	2.3.4
<submit>	Submit values to a documentserver	5.3.8
<throw>	Throw an event.	5.2.1
<transfer>	Transfer the caller to anotherdestination	2.3.7
<value>	Insert the value of an expression ina prompt	4.1.4
<var>	Declare a variable	5.3.1
<vxml>	Top-level element in each VoiceXMLdocument	1.5.1

1.5Document Structure and Execution

A VoiceXML document is primarily composed of top-levelelements calleddialogs. There are two types of dialogs:forms andmenus. A document may also have<meta> and <metadata> elements, <var> and<script> elements, <property> elements, <catch>elements, and <link> elements.

1.5.1 Execution within One Document

Document execution begins at the first dialog by default. Aseach dialog executes, it determines the next dialog. When adialog doesn't specify a successor dialog, documentexecution stops.

Here is "Hello World!" expanded to illustrate some of this. Itnow has a document level variable called "hi" which holds thegreeting. Its value is used as the prompt in the first form. Oncethe first form plays the greeting, it goes to the form named"say_goodbye", which prompts the user with "Goodbye!" Because thesecond form does not transition to another dialog, it causes thedocument to be exited.

<?xml version="1.0" encoding="UTF-8"?><vxml xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"   version="2.0"> <meta name="author" content="John Doe"/> <meta name="maintainer" content="hello-support@hi.example.com"/> <var name="hi" expr="'Hello World!'"/> <form>  <block>     <value expr="hi"/>     <goto next="#say_goodbye"/>  </block> </form> <form>  <block>     Goodbye!  </block> </form></vxml>

Alternatively the forms can be combined:

<?xml version="1.0" encoding="UTF-8"?><vxml xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"   version="2.0"> <meta name="author" content="John Doe"/> <meta name="maintainer" content="hello-support@hi.example.com"/> <var name="hi" expr="'Hello World!'"/> <form>  <block>     <value expr="hi"/> Goodbye!  </block> </form></vxml>

Attributes of <vxml> include:

Table 2: <vxml>Attributes
version	The version of VoiceXML of thisdocument (required). The current version number is 2.0.
xmlns	The designated namespace for VoiceXML(required). The namespace for VoiceXML is defined to behttp://www.w3.org/2001/vxml.
xml:base	The base URI for this document asdefined in[XML-BASE].As in[HTML], a URI whichall relative references within the document take as theirbase.
xml:lang	Thelanguage identifier for this document .If omitted, the value is a platform-specific default.
application	The URI of this document'sapplication root document, if any.

Language information is inherited down the document hierarchy:the value of "xml:lang" is inherited by elements which alsodefine the "xml:lang" attribute, such as <grammar> and<prompt>, unless these elements specify an alternativevalue.

1.5.2 Executing a Multi-DocumentApplication

Normally, each document runs as an isolated application. Incases where you want multiple documents to work together as oneapplication, you select one document to be theapplicationroot document, and the rest to beapplication leafdocuments. Each leaf document names the root document in its<vxml> element.

When this is done, every time the interpreter is told to loadand execute a leaf document in this application, it first loadsthe application root document if it is not already loaded. Theapplication root document remains loaded until the interpreter istold to load a document that belongs to a different application.Thus one of the following two conditions always holds duringinterpretation:

The application root document is loaded and the user isexecuting in it: there is no leaf document.
The application root document and a single leaf document areboth loaded and the user is executing in the leaf document.

If there is a chain of subdialogs defined in separatedocuments, then there may be more than one leaf document loadedalthough execution will only be in one of these documents.

When a leaf document load causes a root document load, none ofthe dialogs in the root document are executed. Execution beginsin the leaf document.

There are several benefits to multi-document applications.

The root document's variables are available for use by theleaf documents, so that information can be shared andretained.
Root document <property> elements specify defaultvalues for properties used in the leaf documents.
Common ECMAScript code can be defined in root document<script> elements and used in the leaf documents.
Root document <catch> elements define default eventhandling for the leaf documents.
Document-scoped grammars in the root document are active whenthe user is in a leaf document, so that the user is able tointeract with forms, links, and menus in the root document.

Here is a two-document application illustrating this:

Application root document (app-root.vxml)

<?xml version="1.0" encoding="UTF-8"?><vxml xmlns="http://www.w3.org/2001/vxml"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"   version="2.0"> <var name="bye" expr="'Ciao'"/> <link next="operator_xfer.vxml">   <grammar type="application/srgs+xml" root="root" version="1.0">     <rule scope="public">operator</rule>  </grammar> </link></vxml>

Leaf document (leaf.vxml)

<?xml version="1.0" encoding="UTF-8"?><vxml xmlns="http://www.w3.org/2001/vxml"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"   version="2.0" application="app-root.vxml"> <form>  <field name="answer">     <grammar type="application/srgs+xml" src="/grammars/boolean.grxml"/>     <prompt>Shall we say <value expr="application.bye"/>?</prompt>     <filled>       <if cond="answer">        <exit/>       </if>       <clear namelist="answer"/>     </filled>  </field> </form></vxml>

In this example, the application is designed so that leaf.vxmlmust be loaded first. Its application attribute specifies thatapp-root.vxml should be used as the application root document.So, app-root.vxml is then loaded, which creates the applicationvariable bye and also defines a link that navigates tooperator-xfer.vxml whenever the user says "operator". The userstarts out in the say_goodbye form:

C: Shall we say Ciao?
H: Si.
C: I did not understand what you said.(aplatform-specific default message.)
C: Shall we say Ciao?
H: Ciao
C: I did not understand what you said.
H: Operator.
C:(Goes to operator_xfer.vxml, whichtransfers the caller to a human operator.)

Note that when the user is in a multi-document application, atmost two documents are loaded at any one time: the applicationroot document and, unless the user is actually interacting withthe application root document, an application leaf document. Aroot document's <vxml> element does not have an applicationattribute specified. A leaf document's <vxml> element doeshave an application attribute specified. An interpreter alwayshas an application root document loaded; it does not always havean application leaf document loaded.

Thename of the interpreter's current application is theapplication root document's absolute URI. The absolute URIincludes a query string, if present, but it does not include afragment identifier. The interpreter remains in the sameapplication as long as the name remains the same. When the namechanges, a new application is entered and its root context isinitialized. The application's root context consists of thevariables, grammars, catch elements, scripts, and properties inapplication scope.

During a user session an interpreter transitions from onedocument to another as requested by <choice>, <goto><link>, <subdialog>, and <submit> elements.Some transitions are within an application, others are betweenapplications. The preservation or initialization of the rootcontext depends on the type of transition:

Root to Leaf Within Application: A root to leaf transition within the same application occurswhen the current document is a root document and the targetdocument's application attribute's value resolves to the sameabsolute URI as the name of the current application. Theapplication root document and its context are preserved.
Leaf to Leaf Within Application: A leaf to leaf transition within the same application occurswhen the current document is a leaf document and the targetdocument's application attribute's value resolves to the sameabsolute URI as the name of the current application. Theapplication root document and its context are preserved.
Leaf to Root Within Application: A leaf to root transition within the same application occurswhen the current document is a leaf document and the targetdocument's absolute URI is the same as the name of the currentapplication. The current application root document and itscontext are preserved when the transition is caused by a<choice>, <goto>, or <link> element. The rootcontext is initialized when a <submit> element causes theleaf to root transition, because a <submit> always resultsin a fetch of its URI.
Root to Root: A root to root transition occurs when the current document isa root document and the target document is a root document, i.e.it does not have an application attribute. The root context isinitialized with the application root document returned by thecaching policy inSection6.1.2. The caching policy is consulted even when thename of the targetapplication and the current application are the same.
Subdialog: A subdialog invocation occurs when a root or leaf documentexecutes a <subdialog> element. As discussed inSection 2.3.4, subdialoginvocation creates a new execution context. The application rootdocument and its context in the calling document's executioncontext are preserved untouched during subdialog execution, andare used again once the subdialog returns. A subdialog's newexecution context has its own root context and, possibly, leafcontext. When the subdialog is invoked with a non-empty URIreference, the caching policy inSection 6.1.2 is used to acquire the root andleaf documents that will be used to initialize the new root andleaf contexts. If a subdialog is invoked with an empty URIreference and a fragment identifier, e.g. "#sub1", the root andleaf documents remain unchanged, and therefore the current rootand leaf documents will be used to initialize the new root andleaf contexts.
Inter-Application Transitions: All other transitions are between applications which causethe application root context to be initialized with the nextapplication's root document.

If a document refers to a non-existent application rootdocument, an error.badfetch event is thrown. If a document'sapplication attribute refers to a document that also has anapplication attribute specified, an error.semantic event isthrown.

The following diagrams illustrate the effect of thetransitions between root and leaf documents on the applicationroot context. In these diagrams, boxes represent documents, boxtexture changes identify root context initialization, solidarrows symbolize transitions to the URI in the arrow's label,dashed vertical arrows indicate an application attribute whoseURI is the arrow's label.

Figure 3: Transitions that Preserve the Root Context

In this diagram, all the documents belong to the sameapplication. The transitions are identified by the numbers 1-4across the top of the figure. They are:

A transition to URI A results in document 1, the applicationcontext is initialized from document 1's content. Assume thatthis is the first document in the session. The currentapplication's name is A.
Document 1 specifies a transition to URI B, which yieldsdocument 2. Document 2's application attribute equals URI A. Theroot is document 1 with its context preserved. This is a root toleaf transition within the same application.
Document 2 specifies a transition to URI C, which yieldsanother leaf document, document 3. Its application attribute alsoequals URI A. The root is document 1 with its context preserved.This is a leaf to leaf transition within the sameapplication.
Document 3 specifies a transition to URI A using a<choice>, <goto>, or <link>. Document 1 is usedwith its root context intact. This is a leaf to root transitionwithin the same application.

The next diagram illustrates transitions which initialize theroot context.

Figure 4: Transitions that Initialize the Root Context

Document 1 specifies a transition to its own URI A. Theresulting document 4 does not have an application attribute, soit is considered a root document, and the root context isinitialized. This is a root to root transition.
Document 4 specifies a transition to URI D, which yields aleaf document 5. Its application attribute is different: URI E. Anew application is being entered. URI E produces the rootdocument 6. The root context is initialized from the content ofdocument 6. This is an inter-application transition.
Document 5 specifies a transition to URI A. The cache checkreturns document 4 which does not have an application attributeand therefore belongs to application A, so the root context isinitialized. Initialization occurs even though this applicationand this root document were used earlier in the session. This isan inter-application transition.

1.5.3 Subdialogs

A subdialog is a mechanism for decomposing complex sequencesof dialogs to better structure them, or to create reusablecomponents. For example, the solicitation of account informationmay involve gathering several pieces of information, such asaccount number, and home telephone number. A customer careservice might be structured with several independent applicationsthat could share this basic building block, thus it would bereasonable to construct it as a subdialog. This is illustrated inthe example below. The first document, app.vxml, seeks to adjusta customer's account, and in doing so must get the accountinformation and then the adjustment level. The accountinformation is obtained by using a subdialog element that invokesanother VoiceXML document to solicit the user input. While thesecond document is being executed, the calling dialog issuspended, awaiting the return of information. The seconddocument provides the results of its user interactions using a<return> element, and the resulting values are accessedthrough the variable defined by the name attribute on the<subdialog> element.

Customer Service Application (app.vxml)

<?xml version="1.0" encoding="UTF-8"?><vxml xmlns="http://www.w3.org/2001/vxml"    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"   version="2.0">  <form>    <var name="account_number"/>    <var name="home_phone"/>    <subdialog name="accountinfo" src="acct_info.vxml#basic">      <filled>        <!-- Note the variable defined by "accountinfo" is          returned as an ECMAScript object and it contains two          properties defined by the variables specified in the          "return" element of the subdialog. -->        <assign name="account_number" expr="accountinfo.acctnum"/>        <assign name="home_phone" expr="accountinfo.acctphone"/>      </filled>    </subdialog>    <field name="adjustment_amount">     <grammar type="application/srgs+xml" src="/grammars/currency.grxml"/>      <prompt>        What is the value of your account adjustment?      </prompt>      <filled>        <submit  next="/cgi-bin/updateaccount"/>      </filled>    </field>  </form></vxml>

Document Containing Account Information Subdialog(acct_info.vxml)

<?xml version="1.0" encoding="UTF-8"?><vxml xmlns="http://www.w3.org/2001/vxml"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"   version="2.0">  <form>    <field name="acctnum">      <grammar type="application/srgs+xml" src="/grammars/digits.grxml"/>      <prompt> What is your account number? </prompt>    </field>    <field name="acctphone">      <grammar type="application/srgs+xml" src="/grammars/phone_numbers.grxml"/>      <prompt> What is your home telephone number? </prompt>      <filled>        <!-- The values obtained by the two fields are supplied          to the calling dialog by the "return" element. -->        <return namelist="acctnum acctphone"/>      </filled>    </field>  </form></vxml>

Subdialogs add a new execution context when they areinvoked.The subdialog could be a new dialog within the existingdocument, or a new dialog within a new document.

Subdialogs can be composed of several documents. Figure 5shows the execution flow where a sequence of documents (D)transitions to a subdialog (SD) and then back.

Figure 5: Subdialog composed of several documentsreturning from the last subdialog document.

The execution context in dialog D2 is suspended when itinvokes the subdialog SD1 in document sd1.vxml. This subdialogspecifies execution is to be transfered to the dialog in sd2.vxml(using <goto>). Consequently, when the dialog in sd2.vxmlreturns, control is returned directly to dialog D2.

Figure 6 shows an example of a multi-document subdialog wherecontrol is transferred from one subdialog to another.

Figure 6: Subdialog composed of several documentsreturning from the first subdialog document.

The subdialog in sd1.vxml specifies that control is to betransfered to a second subdialog, SD2, in sd2.vxml. Whenexecuting SD2, there are two suspended contexts: the dialogcontext in D2 is suspending awaiting SD1 to return; and thedialog context in SD1 awaiting SD2 to return. When SD2 returns,control is returned to the SD1. It in turn returns control todialog D2.

1.5.4 Final Processing

Under certain circumstances (in particular, while the VoiceXMLinterpreter is processing a disconnect event) the interpreter maycontinue executing in thefinal processing state afterthere is no longer a connection to allow the interpreter tointeract with the end user. The purpose of this state is to allowthe VoiceXML application to perform any necessary final cleanup,such as submitting information to the application server. Forexample, the following <catch> element will catch theconnection.disconnect.hangup event and execute in the finalprocessing state:

<catch event="connection.disconnect.hangup">    <submit namelist="myExit" next="http://mysite/exit.jsp"/></catch>

While in the final processing state the application mustremain in the transitioning state and may not enter the waitingstate (as described inSection4.1.8). Thus for example the application should not enter<field>, <record>, or <transfer> while in thefinal processing state. The VoiceXML interpreter must exit if theVoiceXML application attempts to enter the waiting state while inthe final processing state.

Aside from this restriction, execution of the VoiceXMLapplication continues normally while in the final processingstate. Thus for example the application may transition betweendocuments while in the final processing state, and theinterpreter must exit if no form item is eligible to be selected(as described inSection2.1.1).

2. DialogConstructs

2.1Forms

Forms are the key component of VoiceXML documents. A formcontains:

A set ofform items, elements that are visited in themain loop of the form interpretation algorithm. Form items aresubdivided intoinput items that can be 'filled' by userinput andcontrol items that cannot.
Declarations of non-form item variables.
Event handlers.
"Filled" actions, blocks of procedural logic that execute whencertain combinations of input item variables are assigned.

Form attributes are:

Table 3: <form>Attributes
id	The name of the form. If specified,the form can be referenced within the document or from anotherdocument. For instance <form>, <gotonext="#weather">.
scope	The default scope of the form'sgrammars. If it is dialog then the form grammars are active onlyin the form. If the scope is document, then the form grammars areactive during any dialog in the same document. If the scope isdocument and the document is an application root document, thenthe form grammars are active during any dialog in any document ofthis application. Note that the scope of individual form grammarstakes precedence over the default scope; for example, in non-rootdocuments a form with the default scope "dialog", and a formgrammar with the scope "document", then that grammar is active inany dialog in the document.

This section describes some of the concepts behind forms, andthen gives some detailed examples of their operation.

2.1.1 Form Interpretation

Forms are interpreted by an implicit form interpretationalgorithm (FIA). The FIA has a main loop that repeatedly selectsa form item and then visits it. The selected form item is thefirst in document order whose guard condition is not satisfied.For instance, a field's default guard condition tests tosee if the field's form item variable has a value, so thatif a simple form contains only fields, the user will be promptedfor each field in turn.

Interpreting a form item generally involves:

Selecting and playing one or more prompts;
Collecting a user input, either a response that fills in oneor more input items, or a throwing of some event (help, forinstance); and
Interpreting any <filled> actions that pertained to thenewly filled in input items.

The FIA ends when it interprets a transfer of controlstatement (e.g. a <goto> to another dialog or document, ora <submit> of data to the document server). It also endswith an implied <exit> when no form item remains eligibleto select.

The FIA is described in more detail inSection 2.1.6.

2.1.2 Form Items

Form items are the elements that can be visited in the mainloop of the form interpretation algorithm. Input items direct theFIA to gather a result for a specific element. When the FIAselects a control item, the control item may contain a block ofprocedural code to execute, or it may tell the FIA to set up theinitial prompt-and-collect for a mixed initiative form.

2.1.2.1 Input Items

An input item specifies aninput item variable togather from the user. Input items have prompts to tell the userwhat to say or key in, grammars that define the allowed inputs,and event handlers that process any resulting events. An inputitem may also have a <filled> element that defines anaction to take just after the input item variable is filled.Input items consist of:

Table 4: Input Items
<field>	An input item whose value is obtainedvia ASR or DTMF grammars.
<record>	An input item whose value is an audioclip recorded by the user. A <record> element could collecta voice mail message, for instance.
<transfer>	An input item which transfers theuser to another telephone number. If the transfer returnscontrol, the field variable will be set to the resultstatus.
<object>	This input item invokes aplatform-specific "object" with various parameters. The result ofthe platform object is an ECMAScript Object. One platform objectcould be a builtin dialog that gathers credit card information.Another could gather a text message using some proprietary DTMFtext entry method. There is no requirement for implementations toprovide platform-specific objects, although implementations musthandle the <object> element by throwingerror.unsupported.objectname if the particular platform-specificobject is not supported (note that 'objectname' inerror.unsupported.objectname is a fixed string, so notsubstituted with the name of the unsupported object; morespecific error information may be provided in the event"_message" special variable as described inSection 5.2.2).
<subdialog>	A <subdialog> input item isroughly like a function call. It invokes another dialog on thecurrent page, or invokes another VoiceXML document. It returns anECMAScript Object as its result.

2.1.2.2 Control Items

There are two types of control items:

Table 5: Control Items
<block>	A sequence of procedural statementsused for prompting and computation, but not for gathering input.A block has a (normally implicit) form item variable that is setto true just before it is interpreted.
<initial>	This element controls the initialinteraction in a mixed initiative form. Its prompts should bewritten to encourage the user to say something matching a formlevel grammar. When at least one input item variable is filled asa result of recognition during an <initial> element, theform item variable of <initial> becomes true, thus removingit as an alternative for the FIA.

2.1.3 Form Item Variables and Conditions

Each form item has an associatedform item variable,which by default is set to undefined when the form is entered.This form item variable will contain the result of interpretingthe form item. An input item's form item variable is alsocalled aninput item variable, and it holds the valuecollected from the user. A form item variable can be given a nameusing the name attribute, or left nameless, in which case aninternal name is generated.

Each form item also has aguard condition, whichgoverns whether or not that form item can be selected by the forminterpretation algorithm. The default guard condition just teststo see if the form item variable has a value. If it does, theform item will not be visited.

Typically, input items are given names, but control items arenot. Generally form item variables are not given initial valuesand additional guard conditions are not specified. But sometimesthere is a need for more detailed control. One form may have aform item variable initially set to hide a field, and latercleared (e.g., using <clear>) to force the field'scollection. Another field may have a guard condition thatactivates it only when it has not been collected, and when twoother fields have been filled. A block item could execute onlywhen some condition holds true. Thus, fine control can beexercised over the order in which form items are selected andexecuted by the FIA, however in general, many dialogs can beconstructed without resorting to this level of complexity.

In summary, all form items have the following attributes:

Table 6: Common Form ItemAttributes
name	The name of a dialog-scoped form itemvariable that will hold the value of the form item.
expr	The initial value of the form itemvariable; default is ECMAScript undefined. If initialized to avalue, then the form item will not be executed unless the formitem variable is cleared.
cond	An expression to evaluate inconjunction with the test of the form item variable. If absent,this defaults to true, or in the case of <initial>, a testto see if any input item variable has been filled in.

2.1.4 Directed Forms

The simplest and most common type of form is one in which theform items are executed exactly once in sequential order toimplement a computer-directed interaction. Here is a weatherinformation service that uses such a form.

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form> <block>Welcome to the weather information service.</block> <field name="state">  <prompt>What state?</prompt>  <grammar src="state.grxml"  type="application/srgs+xml"/>  <catch event="help">     Please speak the state for which you want the weather.  </catch> </field> <field name="city">  <prompt>What city?</prompt>  <grammar src="city.grxml" type="application/srgs+xml"/>  <catch event="help">     Please speak the city for which you want the weather.  </catch> </field> <block>  <submit next="/servlet/weather" namelist="city state"/> </block></form></vxml>

This dialog proceeds sequentially:

C (computer): Welcome to the weather information service. Whatstate?
H (human): Help
C: Please speak the state for which you want the weather.
H: Georgia
C: What city?
H: Tblisi
C: I did not understand what you said. What city?
H: Macon
C: The conditions in Macon Georgia are sunny and clear at 11AM ...

The form interpretation algorithm's first iterationselects the first block, since its (hidden) form item variable isinitially undefined. This block outputs the main prompt, and itsform item variable is set to true. On the FIA's seconditeration, the first block is skipped because its form itemvariable is now defined, and the state field is selected becausethe dialog variable state is undefined. This field prompts theuser for the state, and then sets the variable state to theanswer. A detailed description of the filling of form itemvariables from a field-level grammar may be found inSection 3.1.6. The third formiteration prompts and collects the city field. The fourthiteration executes the final block and transitions to a differentURI.

Each field in this example has a prompt to play in order toelicit a response, a grammar that specifies what to listen for,and an event handler for the help event. The help event is thrownwhenever the user asks for assistance. The help event handlercatches these events and plays a more detailed prompt.

Here is a second directed form, one that prompts for creditcard information:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form> <block>We now need your credit card type, number,    and expiration date.</block> <field name="card_type">  <prompt count="1">What kind of credit card    do you have?</prompt>  <prompt count="2">Type of card?</prompt>  <!-- This is an inline grammar. -->  <grammar type="application/srgs+xml" root="r2" version="1.0">    <rule scope="public">       <one-of>       <item>visa</item>       <item>master <item repeat="0-1">card</item></item>       <item>amex</item>       <item>american express</item>       </one-of>    </rule>  </grammar>  <help> Please say Visa, MasterCard, or American Express.</help> </field> <field name="card_num">  <grammar type="application/srgs+xml" src="/grammars/digits.grxml"/>  <prompt count="1">What is your card number?</prompt>  <prompt count="2">Card number?</prompt>  <catch event="help">  <if cond="card_type =='amex' || card_type =='american express'">       Please say or key in your 15 digit card number.     <else/>       Please say or key in your 16 digit card number.     </if>  </catch>  <filled>     <if cond="(card_type == 'amex' || card_type =='american express')           &amp;&amp; card_num.length != 15">       American Express card numbers must have 15 digits.       <clear namelist="card_num"/>       <throw event="nomatch"/>     <elseif cond="card_type != 'amex'                &amp;&amp; card_type !='american express'                &amp;&amp; card_num.length != 16"/>       MasterCard and Visa card numbers have 16 digits.       <clear namelist="card_num"/>       <throw event="nomatch"/>     </if>  </filled> </field> <field name="expiry_date">   <grammar type="application/srgs+xml" src="/grammars/digits.grxml"/>   <prompt count="1">What is your card's expiration date?</prompt>   <prompt count="2">Expiration date?</prompt>  <help>     Say or key in the expiration date, for example one two oh one.  </help>  <filled>     <!-- validate the mmyy -->     <var name="mm"/>     <var name="i" expr="expiry_date.length"/>     <if cond="i == 3">       <assign name="mm" expr="expiry_date.substring(0,1)"/>     <elseif cond="i == 4"/>       <assign name="mm" expr="expiry_date.substring(0,2)"/>     </if>     <if cond="mm == '' || mm &lt; 1 || mm &gt; 12">       <clear namelist="expiry_date"/>       <throw event="nomatch"/>     </if>  </filled> </field> <field name="confirm">  <grammar type="application/srgs+xml" src="/grammars/boolean.grxml"/>  <prompt>      I have <value expr="card_type"/> number      <value expr="card_num"/>, expiring on      <value expr="expiry_date"/>.      Is this correct?  </prompt>  <filled>    <if cond="confirm">      <submit next="place_order.asp"        namelist="card_type card_num expiry_date"/>    </if>    <clear namelist="card_type card_num expiry_date confirm"/>  </filled> </field></form></vxml>

Note that the grammar alternatives 'amex' and 'americanexpress' return literal values which need to be handledseparately in the conditional expressions.Section 3.1.5 describes how semantic attachmentsin the grammar can be used to return a single representation ofthese inputs.

The dialog might go something like this:

C: We now need your credit card type, number, and expirationdate.
C: What kind of credit card do you have?
H: Discover
C: I did not understand what you said.(aplatform-specific default message.)
C: Type of card?(the second prompt isused now.)
H: Shoot.(fortunately treated as "help"by this platform)
C: Please say Visa, MasterCard, or American Express.
H: Uh, Amex.(this platform ignores"uh")
C: What is your card number?
H: One two three four ... wait ...
C: I did not understand what you said.
C: Card number?
H:(uses DTMF) 1 2 3 4 5 6 7 8 9 01 2 3 4 5 #
C: What is your card's expiration date?
H: one two oh one
C: I have Amex number 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 expiringon 1 2 0 1. Is this correct?
H: Yes

Fields are the major building blocks of forms. A fielddeclares a variable and specifies the prompts, grammars, DTMFsequences, help messages, and other event handlers that are usedto obtain it. Each field declares a VoiceXML form item variablein the form's dialog scope. These may be submitted once theform is filled, or copied into other variables.

Each field has its own speech and/or DTMF grammars, specifiedexplicitly using <grammar> elements, or implicitly usingthe type attribute. The type attribute is used for builtingrammars, like digits, boolean, or number.

Each field can have one or more prompts. If there is one, itis repeatedly used to prompt the user for the value until one isprovided. If there are many, prompts are selected for playbackaccording to the prompt selection algorithm (seeSection 4.1.6). The countattribute can be used to determine which prompts to use on eachattempt. In the example, prompts become shorter. This is calledtapered prompting.

The <catch event="help"> elements are event handlersthat define what to do when the user asks for help. Help messagescan also be tapered. These can be abbreviated, so that thefollowing two elements are equivalent:

<catch event="help"> Please say visa, mastercard, or amex.</catch><help> Please say visa, mastercard, or amex.</help>

The <filled> element defines what to do when the userprovides a recognized input for that field. One use is to specifyintegrity constraints over and above the checking done by thegrammars, as with the date field above.

2.1.5 Mixed Initiative Forms

The last section talked about forms implementing rigid,computer-directed conversations. To make a formmixedinitiative, where both the computer and the human directthe conversation, it must have one or more form-level grammars.The dialog may be written in several ways. One common authoringtyle combines an <initial> element that prompts for ageneral response with <field> elements that prompt forspecific information. This is illustrated in the examplebelow. More complex techniques, such as using the 'cond'attribute on <field> elements, may achieve a similareffect.

If aform has form-level grammars:

Its input items can be filled in any order.
More than one input item can be filled as a result of a singleuser utterance.

Only input items (and not control items) can be filled as aresult of matching a form-level grammar. The filling of fieldvariables when using a form-level grammar is described inSection 3.1.6.

Also, the form's grammars can be active when the user isin other dialogs. If a document has two forms on it, say a carrental form and a hotel reservation form, and both forms havegrammars that are active for that document, a user could respondto a request for hotel reservation information with informationabout the car rental, and thus direct the computer to talk aboutthe car rental instead. The user can speak to any active grammar,and have input items set and actions taken in response.

Example. Here is a second version of the weatherinformation service, showing mixed initiative. It has been"enhanced" for illustrative purposes with advertising and with aconfirmation of the city and state:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form> <grammar src="cityandstate.grxml" type="application/srgs+xml"/> <!-- Caller can't barge in on today's advertisement. --> <block>  <prompt bargein="false">     Welcome to the weather information service.     <audio src="http://www.online-ads.example.com/wis.wav"/>  </prompt> </block> <initial name="start">  <prompt>     For what city and state would you like the weather?  </prompt>  <help>     Please say the name of the city and    state for which you would like a weather report.  </help>  <!-- If user is silent, reprompt once, then   try directed prompts. -->  <noinput count="1"> <reprompt/></noinput>  <noinput count="2"> <reprompt/>    <assign name="start" expr="true"/></noinput> </initial> <field name="state">  <prompt>What state?</prompt>  <help>    Please speak the state for which you want the weather.  </help> </field> <field name="city">  <prompt>Please say the city in <value expr="state"/>                 for which you want the weather.</prompt>  <help>Please speak the city for which you    want the weather.</help>  <filled>     <!-- Most of our customers are in LA. -->     <if cond="city == 'Los Angeles' &amp;&amp; state == undefined">       <assign name="state" expr="'California'"/>     </if>  </filled> </field> <field name="go_ahead" modal="true">  <grammar type="application/srgs+xml" src="/grammars/boolean.grxml"/>  <prompt>Do you want to hear the weather for    <value expr="city"/>, <value expr="state"/>?  </prompt>  <filled>     <if cond="go_ahead">       <prompt bargein="false">         <audio src="http://www.online-ads.example.com/wis2.wav"/>       </prompt>       <submit next="/servlet/weather" namelist="city state"/>     </if>     <clear namelist="start city state go_ahead"/>  </filled> </field></form></vxml>

Here is a transcript showing the advantages for even a noviceuser:

C: Welcome to the weather information service. Buy Joe'sSpicy Shrimp Sauce.
C: For what city and state would you like the weather?
H: Uh, California.
C: Please say the city in California for which you want theweather.
H: San Francisco, please.
C: Do you want to hear the weather for San Francisco,California?
H: No
C: For what city and state would you like the weather?
H: Los Angeles.
C: Do you want to hear the weather for Los Angeles,California?
H: Yes
C: Don't forget, buy Joe's Spicy Shrimp Saucetonight!
C: Mostly sunny today with highs in the 80s. Lows tonight fromthe low 60s ...

The go_ahead field has its modal attribute set to true. Thiscauses all grammars to be disabled except the ones defined in thecurrent form item, so that the only grammar active during thisfield is the grammar for boolean.

An experienced user can get things done much faster (but isstill forced to listen to the ads):

C: Welcome to the weather information service. Buy Joe'sSpicy Shrimp Sauce.
C: What ...
H(barging in): LA
C: Do you ...
H(barging in): Yes
C: Don't forget, buy Joe's Spicy Shrimp Saucetonight!
C: Mostly sunny today with highs in the 80s. Lows tonight fromthe low 60s ...

2.1.5.1 Controlling the order of fieldcollection.

The form interpretation algorithm can be customized in severalways. One way is to assign a value to a form item variable, sothat its form item will not be selected. Another is to use<clear> to set a form item variable to undefined; thisforces the FIA to revisit the form item again.

Another method is to explicitly specify the next form item tovisit using <goto nextitem>. This forces an immediatetransfer to that form item even if any cond attribute presentevaluates to "false". No variables, conditions or counters inthe targeted form item will be reset. The form item's promptwill be played even if it has already been visited. If the<goto nextitem> occurs in a <filled> action, the restof the <filled> action and any pending <filled>actions will be skipped.

Here is an example <goto nextitem> executed in responseto the exit event:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><link event="exit"> <grammar type="application/srgs+xml" src="/grammars/exit.grxml"/></link><form>  <catch event="exit">    <reprompt/>    <goto nextitem="confirm_exit"/>  </catch>  <block>   <prompt>     Hello, you have been called at random to answer questions     critical to U.S. foreign policy.   </prompt>  </block>  <field name="q1">   <grammar type="application/srgs+xml" src="/grammars/boolean.grxml"/>   <prompt>Do you agree with the IMF position on     privatizing certain functions of Burkina Faso's     agriculture ministry?</prompt>  </field>  <field name="q2">    <grammar type="application/srgs+xml" src="/grammars/boolean.grxml"/>    <prompt>If this privatization occurs, will its      effects be beneficial mainly to Ouagadougou and      Bobo-Dioulasso?</prompt>  </field>  <field name="q3">    <grammar type="application/srgs+xml" src="/grammars/boolean.grxml"/>    <prompt>Do you agree that sorghum and millet output      might thereby increase by as much as four percent per      annum?</prompt>  </field>  <block>    <submit next="register" namelist="q1 q2 q3"/>  </block>  <field name="confirm_exit">    <grammar type="application/srgs+xml" src="/grammars/boolean.grxml"/>    <prompt>You have elected to exit.  Are you      sure you want to do this, and perhaps adversely affect      U.S. foreign policy vis-a-vis sub-Saharan Africa for      decades to come?</prompt>    <filled>      <if cond="confirm_exit">        Okay, but the U.S. State Department is displeased.        <exit/>      <else/>        Good, let's pick up where we left off.        <clear namelist="confirm_exit"/>      </if>    </filled>    <catch event="noinput nomatch">      <throw event="exit"/>    </catch>  </field></form></vxml>

If the user says "exit" in response to any of the surveyquestions, an exit event is thrown by the platform and caught bythe <catch> event handler. This handler directs thatconfirm_exit be the next visited field. The confirm_exit fieldwould not be visited during normal completion of the surveybecause the preceding <block> element transfers control tothe registration script.

2.1.6 Form Interpretation Algorithm

We've presented the form interpretation algorithm (FIA)at a conceptual level. In this section we describe it in moredetail. A more formal description is provided inAppendix C.

2.1.6.1 Initialization Phase

Whenever a form is entered, it is initialized. Internal promptcounter variables (in the form's dialog scope) are reset to1. Each variable (form-level <var> elements and form itemvariables) is initialized, in document order, to undefined or tothe value of the relevant expr attribute.

2.1.6.2 Main Loop

The main loop of the FIA has three phases:

Theselect phase: the next unfilled form item isselected for visiting.

Thecollect phase: the selectedform item isvisited, which prompts the user for input, enables theappropriate grammars, and then waits for and collects aninput (such as a spoken phrase or DTMF key presses) oranevent (such as a request for help or a no inputtimeout).

Theprocess phase: an input is processed by fillingform items and executing <filled> elements to performactions such as input validation. An event is processed byexecuting the appropriate event handler for that event type.

Note that the FIA may be given an input (a set of grammarslot/slot value pairs) that was collected while the user was in adifferent form's FIA. In this case the first iteration ofthe main loop skips the select and collect phases, and goes rightto the process phase with that input. Also note that if anerror occurs in the select or collect phase that causes an eventto be generated, the event is thrown and the FIA moves directlyinto the process phase.

2.1.6.2.1 Select phase

The purpose of the select phase is to select the next formitem to visit. This is done as follows:

If a <goto> from the last main loop iteration'sprocess phase specified a <goto nextitem>, then thespecified form item is selected.

Otherwise the first form item whoseguard conditionis false is chosen to be visited. If an error occurs whilechecking guard conditions, the event is thrown which skips thecollect phase, and is handled in the process phase.

If no guard condition is false, and the last iterationcompleted the form without encountering an explicit transfer ofcontrol, the FIA does an implicit <exit> operation(similarly, if execution proceeds outside of a form, such as whenan error is generated outside of a form, and there is no explicittransfer of control, the interpreter will perform an implicit<exit> operation).

2.1.6.2.2 Collect phase

The purpose of the collect phase is to collect an input or anevent. The selected form item isvisited, which performsactions that depend on the type of form item:

If a <field> or <record> is visited,the FIA selects and queues up any prompts based on theitem's prompt counter and the prompt conditions. Then itactivates and listens for the field level grammar(s) and anyactive higher-level grammars, and waits for the item to befilled or for some event to be generated.

If a <transfer> is visited, the prompts are queued basedon the item's prompt counter and the prompt conditions. Theitem grammars are activated. The queue is played before thetransfer is executed.

If a <subdialog> or <object> is visited, theprompts are queued based on the item's prompt counter andthe prompt conditions. Grammars are not activated. Instead, theinput collection behavior is specified by the executing contextfor the subdialog or object. The queue is not played before thesubdialog or object is executed, but instead should be playedduring the subsequent input collection.

If an <initial> is visited, the FIA selects and queuesup prompts based on the <initial>'s prompt counterand prompt conditions. Then it listens for the form levelgrammar(s) and any active higher-level grammars. It waits for agrammar recognition or for an event.

A <block> element is visited by setting its form itemvariable to true, evaluating its content, and then bypassing theprocess phase. No input is collected, and the next iteration ofthe FIA's main loop is entered.

2.1.6.2.3 Process phase

The purpose of the process phase is to process the input orevent collected during the previous phases, as follows:

If an event (such as a noinput or a hangup) occurred, thenthe applicable catch element is identified and executed.Selection of the applicable catch element starts in the scope ofthe current form item and then proceeds outward by enclosingdialog scopes. This can cause the FIA to terminate (e.g. if ittransitions to a different dialog or document or it does an<exit>), or it can cause the FIA to go into the nextiteration of the main loop (e.g. as when the default help eventhandler is executed).
If an input matches a grammar from a <link> then thatlink's transition is executed, or its event is thrown. Ifthe <link> throws an event, the event is processed in thecontext of the current form item (e.g. <initial>,<field>, <transfer>, and so forth).
If an input matches a grammar in a form other than thecurrent form, then the FIA terminates, the other form isinitialized, and that form's FIA is started with this inputin its process phase.

If an input matches a grammar in this form, then:

The semantic result from the grammar is mapped into one ormore form item variables as described inSection 3.1.6.
The <filled> actions triggered by these assignments areidentified as described inSection2.4.
Each identified <filled> action is executed in documentorder. If a <submit>, <disconnect>, <exit>,<return>, <goto> or <throw> is encountered, theremaining <filled> elements are not executed, and the FIAeither terminates or continues in the next main loop iteration.<reprompt> does not terminate the FIA (the name suggests anaction), but rather just sets a flag that affects the treatmentof prompts on the subsequent iteration of the FIA. If an event isthrown during the execution of a <filled>, event handlerselection starts in the scope of the <filled>, which couldbe a form item or the form itself, and then proceeds outward byenclosing dialog scopes.

After completion of the process phase, interpretationcontinues by returning to the select phase.

A more detailed form interpretation algorithm can be found inAppendix C.

2.2Menus

Amenu is a convenient syntactic shorthand for a formcontaining a single anonymous field that prompts the user to makea choice and transitions to different places based on thatchoice. Like a regular form, it can have its grammar scoped suchthat it is active when the user is executing another dialog. Thefollowing menu offers the user three choices:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><menu>  <prompt>    Welcome home. Say one of: <enumerate/>  </prompt>  <choice next="http://www.sports.example.com/vxml/start.vxml">     Sports  </choice>  <choice next="http://www.weather.example.com/intro.vxml">     Weather  </choice>  <choice next="http://www.stargazer.example.com/voice/astronews.vxml">     Stargazer astrophysics news  </choice>  <noinput>Please say one of <enumerate/></noinput></menu></vxml>

This dialog might proceed as follows:

C: Welcome home. Say one of: sports; weather; Stargazerastrophysics news.
H: Astrology.
C: I did not understand what you said.(aplatform-specific default message.)
C: Welcome home. Say one of: sports; weather; Stargazerastrophysics news.
H: sports.
C:(proceeds tohttp://www.sports.example.com/vxml/start.vxml)

2.2.1menu element

This identifies the menu, and determines the scope of itsgrammars. The menu element's attributes are:

Table 7: <menu>Attributes
id	The identifier of the menu. It allowsthe menu to be the target of a <goto> or a<submit>.
scope	The menu's grammar scope. If itis dialog (the default), the menu's grammars are onlyactive when the user transitions into the menu. If the scope isdocument, its grammars are active over the whole document (or ifthe menu is in the application root document, any loaded documentin the application).
dtmf	When set to true, the first ninechoices that have not explicitly specified a value for the dtmfattribute are given the implicit ones "1", "2", etc. Remainingchoices that have not explicitly specified a value for the dtmfattribute will not be assigned DTMF values (and thus cannot bematched via a DTMF keypress). If there are choices which havespecified their own DTMF sequences to be something other than"*", "#", or "0", an error.badfetch will be thrown. The defaultis false.
accept	When set to "exact" (the default),the text of the choice elements in the menu defines the exactphrase to be recognized. When set to "approximate", the text ofthe choice elements defines an approximate recognition phrase (asdescribed underSection2.2.5). Each <choice> can override this setting.

2.2.2choice element

The <choice> element serves several purposes:

It may specify a speech grammar, defined either using a<grammar> element or automatically generated by the processdescribed inSection2.2.5.
It may specify a DTMF grammar, as discussed inSection 2.2.3.
The contents may be used to form the <enumerate> promptstring. This is described inSection 2.2.4.
And it specifies either an event to be thrown or the URI to goto when the choice is selected.

The choice element's attributes are:

Table 8: <choice>Attributes
dtmf	The DTMF sequence for this choice. Itis equivalent to a simple DTMF <grammar> and DTMFproperties (Section 6.3.3)apply to recognition of the sequence. Unlike DTMF grammars,whitespace is optional: dtmf="123#" is equivalent to dtmf="1 2 3#".
accept	Override the setting for accept in<menu> for this particular choice. When set to "exact" (thedefault), the text of the choice element defines the exact phraseto be recognized. When set to "approximate", the text of thechoice element defines an approximate recognition phrase (asdescribed underSection2.2.5).
next	The URI of next dialog ordocument.
expr	Specify an expression to evaluate asa URI to transition to instead of specifying a next.
event	Specify an event to be thrown insteadof specifying a next.
eventexpr	An ECMAScript expression evaluatingto the name of the event to be thrown.
message	A message string providing additionalcontext about the event being thrown. The message is available asthe value of a variable within the scope of the catch element,seeSection 5.2.2.
messageexpr	An ECMAScript expression evaluatingto the message string.
fetchaudio	SeeSection 6.1. This defaults to the fetchaudioproperty.
fetchhint	SeeSection 6.1. This defaults to thedocumentfetchhint property.
fetchtimeout	SeeSection 6.1. This defaults to the fetchtimeoutproperty.
maxage	SeeSection 6.1. This defaults to the documentmaxageproperty.
maxstale	SeeSection 6.1. This defaults to thedocumentmaxstale property.

Exactly one of "next", "expr", "event" or "eventexpr" must bespecified; otherwise, an error.badfetch event is thrown. Exactlyone of "message" or "messageexpr" may be specified; otherwise, anerror.badfetch event is thrown.

If a <grammar> element is specified in <choice>,then the external grammar is used instead of an automaticallygenerated grammar. This allows the developer to precisely controlthe <choice> grammar; for example:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><menu>  <choice next="http://www.sports.example.com/vxml/start.vxml">    <grammar src="sports.grxml" type="application/srgs+xml"/>    Sports  </choice>  <choice next="http://www.weather.example.com/intro.vxml">   <grammar src="weather.grxml" type="application/srgs+xml"/>   Weather  </choice>  <choice next="http://www.stargazer.example.com/voice/astronews.vxml">   <grammar src="astronews.grxml" type="application/srgs+xml"/>   Stargazer astrophysics  </choice></menu></vxml>

2.2.3DTMF in Menus

Menus can rely purely on speech, purely on DTMF, or both incombination by including a <property> element in the<menu>. Here is a DTMF-only menu with explicit DTMFsequences given to each choice, using the choice's dtmfattribute:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><menu>  <property name="inputmodes" value="dtmf"/>  <prompt>   For sports press 1, For weather press 2, For Stargazer   astrophysics press 3.  </prompt>  <choice dtmf="1" next="http://www.sports.example.com/vxml/start.vxml"/>  <choice dtmf="2" next="http://www.weather.example.com/intro.vxml"/>  <choice dtmf="3" next="http://www.stargazer.example.com/astronews.vxml"/></menu></vxml>

Alternatively, you can set the <menu>'s dtmfattribute to true to assign sequential DTMF digits to each of thefirst nine choices that have not specified their own DTMFsequences: the first choice has DTMF "1", and so on:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><menu dtmf="true">  <property name="inputmodes" value="dtmf"/>  <prompt>   For sports press 1, For weather   press 2, For Stargazer astrophysics press 3.  </prompt>  <choice next="http://www.sports.example.com/vxml/start.vxml"/>  <choice next="http://www.weather.example.com/intro.vxml"/>  <choice dtmf="0" next="#operator"/>  <choice next="http://www.stargazer.example.com/voice/astronews.vxml"/></menu></vxml>

2.2.4enumerate element

The <enumerate> element is an automatically generateddescription of the choices available to the user. It specifies atemplate that is applied to each choice in the order they appearin the menu. If it is used with no content, a default templatethat lists all the choices is used, determined by the interpretercontext. If it has content, the content is the templatespecifier. This specifier may refer to two special variables:_prompt is the choice's prompt, and _dtmf is a normalizedrepresentation (i.e. a single whitespace between DTMF tokens)of the choice's assigned DTMF sequence (note that if no DTMFsequence is assigned to the choice element, or if a<grammar> element is specified in <choice>, thenthe _dtmf variable is assigned the ECMAScript undefined value ).For example, if the menu were rewritten as

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><menu dtmf="true">  <prompt>    Welcome home.    <enumerate>     For <value expr="_prompt"/>, press <value     expr="_dtmf"/>.   </enumerate>   </prompt>   <choice next="http://www.sports.example.com/vxml/start.vxml">      sports </choice>   <choice next="http://www.weather.example.com/intro.vxml">      weather </choice>   <choice next="http://www.stargazer.example.com/voice/astronews.vxml">      Stargazer astrophysics news   </choice></menu></vxml>

then the menu's prompt would be:

C: Welcome home. For sports, press 1. For weather, press 2.For Stargazer astrophysics news, press 3.

The <enumerate> element may be used within the promptsand the catch elements associated with <menu> elements andwith <field> elements that contain <option> elements,as discussed inSection2.3.1.3. An error.semantic event is thrown if<enumerate> is used elsewhere (for example,<enumerate> within an <enumerate>).

2.2.5Grammar Generation

Anychoice phrase specifies a set of words and phrasesto listen for. A choice phrase is constructed from the PCDATA ofthe elements contained directly or indirectly in a <choice>element of a <menu>, or in the <option> element of a<field>.

If the accept attribute is "exact" then the user must say theentire phrase in the same order in which they occur in the choicephrase.

If the accept attribute is "approximate", then the choice maybe matched when a user says a subphrase of the expression. Forexample, in response to the prompt "Stargazer astrophysics news"a user could say "Stargazer", "astrophysics", "Stargazer news","astrophysics news", and so on. The equivalent grammar may belanguage and platform dependent.

As an example of using "exact" and "approximate" in differentchoices, consider this example:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><menu accept="approximate">  <choice next="http://www.stargazer.example.com/voice/astronews.vxml">    Stargazer Astrophysics News </choice>  <choice accept="exact"  next="http://www.physicsweekly.com/voice/example.vxml">    Physics Weekly  </choice>  <choice accept="exact"  next="http://www.particlephysics.com/voice/example.vxml">   Particle Physics Update </choice>  <choice next="http://www.astronomytoday.com/voice/example.vxml">   Astronomy Today </choice></menu></vxml>

Because "approximate" is specified for the first choice, theuser may say a subphrase when matching the first choice; forinstance, "Stargazer" or "Astrophysics News". However, because"exact" is specified in the second and third choices, only acomplete phrase will match: "Physics Weekly" and "ParticlePhysics Update".

2.2.6Interpretation Model

A menu behaves like a form with a single field that does allthe work. The menu prompts become field prompts. The menu eventhandlers become the field event handlers. The menu grammarsbecomeform grammars. As with forms, grammar matches inmenu will update the application.lastresult$ array. Thesevariables are described inSection 5.1.5. Generated grammars must alwaysproduce simple results whose interpretation and utterance valuesare identical.

Upon entry, the menu's grammars are built and enabled,and the prompt is played. When the user input matches a choice,control transitions according to the value of the next, expr,event or eventexpr attribute of the <choice>, only one ofwhich may be specified. If an event attribute is specified butits event handler does not cause the interpreter to exit ortransition control, then the FIA will clear the form itemvariable of the menu's anonymous field, causing the menu to beexecuted again.

2.3 FormItems

Aform item is an element of a <form> that can bevisited during form interpretation. These elements are<field>, <block>, <initial>, <subdialog>,<object>, <record>, and <transfer>.

All form items have the following characteristics:

They have a result variable, specified by the name attribute.This variable may be given an initial value with the exprattribute.
They have a guard condition specified with the cond attribute.A form item is visited if it is not filled and its cond is notspecified or evaluates, after conversion to boolean, to true.

Form items are subdivided intoinput items, those thatdefine the form's input item variables, andcontrolitems, those that help control the gathering of theform's input items. Input items (<field>,<subdialog>, <object>, <record>, and<transfer>) generally may contain the followingelements:

<filled> elements containing some action to executeafter the result input item variable is filled in.
<property> elements to specify properties that are ineffect for this input item (the <initial> form item canalso contain this element).
<prompt> elements to specify prompts to be played whenthis element is visited.
<grammar> elements to specify allowable spoken andcharacter input for this input item (<subdialog> and<object> cannot contain this element).
<catch> elements and catch shorthands that are in effectfor this input item (the <initial> form item can alsocontain this element).

Each input item may have an associated set ofshadowvariables. Shadow variables are used to return results fromthe execution of an input item, other than the value stored underthe name attribute. For example, it may be useful to know theconfidence level that was obtained as a result of a recognizedgrammar in a <field> element. A shadow variable isreferenced asname$.shadowvar wherename isthe value of the form item's name attribute, andshadowvar is the name of a specific shadow variable.Shadow variables are writeable and can be modified by theapplication. For example, the <field> element returns ashadow variable confidence. The example below illustrates howthis shadow variable is accessed.

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form><field name="state">  <prompt> Please say the name of a state. </prompt>  <grammar src="http://mygrammars.example.com/states.gram"              type="application/srgs"/>  <filled>    <if cond="state$.confidence &lt; 0.4">      <throw event="nomatch"/>    </if>  </filled></field></form></vxml>

In the example, the confidence of the result is examined, andthe result is rejected if the confidence is too low.

2.3.1 field element

A field specifies an input item to be gathered from the user.The field element's attributes are:

Table 9: <field>Attributes
name	The form item variable in the dialogscope that will hold the result. The name must be unique amongform items in the form. If the name is not unique, then abadfetch error is thrown when the document is fetched. The namemust conform to the variable naming conventions inSection 5.1.
expr	The initial value of the form itemvariable; default is ECMAScript undefined. If initialized to avalue, then the form item will not be visited unless the formitem variable is cleared.
cond	An expression that must evaluate totrue after conversion to boolean in order for the form item to bevisited. The form item can also be visited if the attribute isnot specified.
type	The type of field, i.e., the name ofa builtin grammar type (seeAppendix P). Platform support for builtingrammar types is optional. If the specified builtin type isnot supported by the platform, an error.unsupported.builtin eventis thrown.
slot	The name of the grammar slot used topopulate the variable (if it is absent, it defaults to thevariable name). This attribute is useful in the case where thegrammar format being used has a mechanism for returning sets ofslot/value pairs and the slot names differ from the form itemvariable names.
modal	If this is false (the default) allactive grammars are turned on while collecting this field. Ifthis is true, then only the field's grammars are enabled:all others are temporarily disabled.

The shadow variables of a <field> element with the namename are given in Table 10. The values of the utterance,inputmode and interpretation shadow variables mustbe the same as those in application.lastresult$ (seeSection 5.1.5).

Table 10: <field> ShadowVariables
name$.utterance	The raw string of words that wererecognized. The exact tokenization and spelling isplatform-specific (e.g. "five hundred thirty" or "5 hundred 30"or even "530"). In the case of a DTMF grammar, this variable willcontain the matched digit string.
name$.inputmode	The mode in which user input wasprovided: dtmf or voice.
name$.interpretation	An ECMAScript variable containing theinterpretation as described inSection 3.1.5.
name$.confidence	The confidence level for thename field and may rangefrom 0.0-1.0. A value of 0.0 indicates minimum confidence, and avalue of 1.0 indicates maximum confidence. A platform may use the utterance confidence (the value ofapplication.lastresult$.confidence) as the value ofname$.confidence. This distinction between field andutterance level confidence is platform-dependent. More specific interpretation of a confidence value isplatform-dependent since its computation is likely to differbetween platforms.

2.3.1.1. Fields Using Explicit Grammars

Explicit grammars can be specified via a URI, which can beabsolute or relative:

<field name="flavor">  <prompt>What is your favorite ice cream?</prompt>  <grammar src="../grammars/ice_cream.grxml"           type="application/srgs+xml"/></field>

Grammars can be specified inline, for example using a W3C ABNFgrammar:

<field name="flavor">  <prompt>What is your favorite flavor?</prompt>  <help>Say one of vanilla, chocolate, or strawberry.</help>  <grammar mode="voice" type="application/srgs">   #ABNF 1.0;   $options = vanilla | chocolate | strawberry  </grammar></field>

If both the <grammar> src attribute and an inlinegrammar are specified, then an error.badfetch is thrown.

2.3.1.2. Explicit Grammars withPlatform-specific Builtins

Platform support for builtin resources such as speechgrammars, DTMF grammars and audio files is optional. Theseresources are accessed using platform-specific URIs, such as"http://localhost:5000/grammar/boolean", or platform-specificschemes such as the commonly used 'builtin' scheme,"builtin:grammar/boolean".

If a platform supports access to builtin resources, then itshould support access to fundamental builtin grammars (seeAppendix P); forexample

<grammar src="builtin:grammar/boolean"/><grammar src="builtin:dtmf/boolean"/>

where the first <grammar> references the builtin booleanspeech grammar, and the second references the builtin booleanDTMF grammar.

By definition the following:

<field type="sample">   <prompt>Prompt for builtin grammar</prompt></field>

is equivalent to the following platform-specific builtingrammars :

<field>   <grammar src="builtin:grammar/sample"/>   <grammar src="builtin:dtmf/sample"/>    <prompt>Prompt for builtin grammar</prompt></field>

wheresample is one of the fundamental builtin fieldtypes (e.g., boolean, date, etc.).

In addition, platform-specific builtin URI schemes may be usedto access grammars that are supported by particular interpretercontexts. It is recommended that platform-specific builtin grammarnames begin with the string "x-", as this namespace will not beused in future versions of the standard.

Examples of platform-specific builtin grammars:

<grammar src="builtin:grammar/x-sample"/><grammar src="builtin:dtmf/x-sample"/>

2.3.1.3. Fields Using Option Lists

When a simple set of alternatives is all that is needed tospecify the legal input values for a field, it may be moreconvenient to use an option list than a grammar. An option listis represented by a set of <option> elements contained in a<field> element. Each <option> element containsPCDATA that is used to generate a speech grammar. This followsthe grammar generation method described for <choice> inSection 2.2.5 . Attributes maybe used to specify a DTMF sequence for each option and to controlthe value assigned to the field's form item variable. When anoption is chosen, the value attribute determines theinterpretation value for the field's shadow variable and forapplication.lastresult$.

The following field offers the user three choices and assignsthe value of the value attribute of the selected option to themaincourse variable:

<field name="maincourse">  <prompt>      Please select an entree. Today, we are featuring <enumerate/>  </prompt>  <option dtmf="1" value="fish"> swordfish </option>  <option dtmf="2" value="beef"> roast beef </option>  <option dtmf="3" value="chicken"> frog legs </option>  <filled>    <submit next="/cgi-bin/maincourse.cgi"        method="post" namelist="maincourse"/>  </filled></field>

This conversation might sound like:

C: Please select an entree. Today, we're featuringswordfish; roast beef; frog legs.
H: frog legs
C:(assigns "chicken" to "maincourse",then submits "maincourse=chicken" to /maincourse.cgi)

The following example shows proper and improper use of<enumerate> in a catch element of a form with severalfields containing <option> elements:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>  <block>    We need a few more details to complete your order.  </block>  <field name="color">    <prompt>Which color?</prompt>    <option>red</option>    <option>blue</option>    <option>green</option>  </field>  <field name="size">    <prompt>Which size?</prompt>    <option>small</option>    <option>medium</option>    <option>large</option>  </field>  <field name="quantity">    <grammar type="application/srgs+xml" src="/grammars/number.grxml"/>    <prompt>How many?</prompt>  </field>  <block>    Thank you.  Your order is being processed.    <submit next="details.cgi" namelist="color size quantity"/>  </block>  <catch event="help nomatch">    Your options are <enumerate/>.  </catch></form></vxml>

A scenario might be:

C: We need a few more details to complete your order. Whichcolor?

H: help. (throws "help" event caught by form-level<catch>)

C: Your options are red, blue, green.

H: red.

C: Which size?

H: 7 (throws "nomatch" event caught by form-level<catch>)

C: Your options are small, medium, large.

H: small.

In the steps above, the <enumerate/> in the form-levelcatch had something to enumerate: the <option> elements inthe "color" and "size" <field> elements. The next<field>, however, is different:

C: How many?

H: a lot. (throws "nomatch" event caught by form-level<catch>)

The form-level <catch>'s use of <enumerate> causesan "error.semantic" event to be thrown because the "quantity"<field> does not contain any <option> elements thatcan be enumerated.

One solution is to add a field-level <catch> to the"quantity" <field>:

<catch event="help nomatch">       Please say the number of items to be ordered.</catch>

The "nomatch" event would then be caught locally, resulting inthe following possible completion of the scenario:

C: Please say the number of items to be ordered.

H: 50

C: Thank you. Your order is being processed.

The <enumerate> element is also discussed inSection 2.2.4.

The attributes of <option> are:

Table 11: <option>Attributes
dtmf	An optional DTMF sequence forthis option. It is equivalent to a simple DTMF<grammar> and DTMF properties (Section 6.3.3) apply to recognition of thesequence. Unlike DTMF grammars, whitespace is optional:dtmf="123#" is equivalent to dtmf="1 2 3 #". If unspecified,no DTMF sequence is associated with this option so it cannotbe matched using DTMF.
accept	When set to "exact" (the default),the text of the option element defines the exact phrase to berecognized. When set to "approximate", the text of the optionelement defines an approximate recognition phrase (as describedinSection 2.2.5).
value	The string to assign to thefield's form item variable when a user selects this option,whether by speech or DTMF. The default assignment is the CDATAcontent of the <option> element with leading and trailingwhite space removed. If this does not exist, then the DTMFsequence is used instead. If neither CDATA content nor adtmf sequence is specified, then the default assignment isundefined and the field's form item variable is notfilled.

The use of <option> does not preclude the simultaneoususe of <grammar>. The result would be the match from either'grammar', not unlike the occurrence of two <grammar>elements in the same <field> representing a disjunction ofchoices.

2.3.2 block element

This element is a form item. It contains executable contentthat is executed if the block's form item variable isundefined and the block's cond attribute, if any, evaluates totrue.

<block>   Welcome to Flamingo, your source for lawn ornaments.</block>

The form item variable is automatically set to true justbefore the block is entered. Therefore, blocks are typicallyexecuted just once per form invocation.

Sometimes you may need more control over blocks. To do this,you can name the form item variable, and set or clear it tocontrol execution of the <block>. This variable is declaredin the dialog scope of the form.

Attributes of <block> include:

Table 12: <block>Attributes
name	The name of the form item variableused to track whether this block is eligible to be executed;defaults to an inaccessible internal variable.
expr	The initial value of the form itemvariable; default is ECMAScript undefined. If initialized to avalue, then the form item will not be visited unless the formitem variable is cleared.
cond	An expression that must evaluate totrue after conversion to boolean in order for the form item to bevisited.

2.3.3. initial element

In a typical mixed initiative form, an <initial> elementis visited when the user is initially being prompted forform-wide information, and has not yet entered into the directedmode where each field is visited individually. Like input items,it has prompts, catches, and event counters. Unlike input items,<initial> has no grammars, and no <filled> action.For instance:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>   <grammar src="http://www.directions.example.com/grammars/from_to.grxml"        type="application/srgs+xml"/>   <block>       Welcome to the Driving Directions By Phone.   </block>   <initial name="bypass_init">     <prompt>       Where do you want to drive from and to?     </prompt>     <nomatch count="1">        Please say something like "from Atlanta Georgia to Toledo Ohio".     </nomatch>     <nomatch count="2">        I'm sorry, I still don't understand.        I'll ask you for information one piece at a time.        <assign name="bypass_init" expr="true"/>        <reprompt/>     </nomatch>   </initial>   <field name="from_city">     <grammar src="http://www.directions.example.com/grammars/city.grxml"               type="application/srgs+xml"/>     <prompt>From which city are you leaving?</prompt>   </field>   <field name="to_city">     <grammar src="http://www.directions.example.com/grammars/city.grxml"               type="application/srgs+xml"/>     <prompt>Which city are you going to?</prompt>   </field></form></vxml>

If an event occurs while visiting an <initial>, then oneof its event handlers executes. As with other form items,<initial> continues to be eligible to be visited while itsform item variable is undefined and while its cond attribute istrue. If one or more of the input item variables is set by userinput, then all <initial> form item variables are set totrue, before any <filled> actions are executed.

An <initial> form item variable can be manipulatedexplicitly to disable, or re-enable the <initial>'seligibility to the FIA. For example, in the program above, the<initial>'s form item variable is set on the second nomatchevent. This causes the FIA to no longer consider the<initial> and to choose the next form item, which is a<field> to prompt explicitly for the origination city.Similarly, an <initial>'s form item variable could becleared, so that <initial> gets selected again by theFIA.

More than one <initial> may be specified in the sameform. When the form is entered only the first <initial> indocument order that is eligible according to its cond attributewill be visited. After the first form item variable is filled,all <initial> form item variables are set to true so thatthey are not visited. Explicitly clearing the <initial>scan allow them to be reused, and even allow a different<initial> to be selected on subsequent iterations of theFIA.

The cond attribute can also be used to select which<initial> to use in a given iteration. An application couldprovide multiple <initial>s but mark them for use onlyunder special circumstances by using their cond attribute; forexample, if the cond attribute were used to test for noviceversus advanced operation mode, and only use the <initial>sin advanced mode. Furthermore, if the first <initial> indocument order specified a value for its cond attribute which wasnever fulfilled, then it would never be executed. If all<initial>s had cond values which prevented their selection,then none would be executed.

Normal grammar scoping rules apply when visiting an<initial>, as described inSection 3.1.3.. In particular, no grammarsscoped to a <field> are active.

Note: explicit assignment of values to input item variablesdoes not affect the value of an <initial>'s form itemvariable.

Attributes of <initial> include:

Table 13: <initial>Attributes
name	The name of a form item variable usedto track whether the <initial> is eligible to execute;defaults to an inaccessible internal variable.
expr	The initial value of the form itemvariable; default is ECMAScript undefined. If initialized to avalue, then the form item will not be visited unless the formitem variable is cleared.
cond	An expression that must evaluate totrue after conversion to boolean in order for the form item to bevisited.

2.3.4 subdialog element

Subdialogs are a mechanism for reusing common dialogs andbuilding libraries of reusable applications.

The <subdialog> element invokes a 'called' dialog (knownas thesubdialog) identified by its src or srcexprattribute in the 'calling' dialog. The subdialog executes in anew execution context that includes all the declarations andstate information for the subdialog, the subdialog's document,and the subdialog's application root (if present), with countersreset, and variables initialized. The subdialog proceeds untilthe execution of a <return> or <exit> element, oruntil no form items remain eligible for the FIA to select(equivalent to an <exit>). A <return> element causescontrol and data to be returned to the calling dialog (Section 5.3.10). When thesubdialog returns, its execution context is deleted, andexecution resumes in the calling dialog with any appropriate<filled> elements.

The subdialog context and the context of the called dialog areindependent, even if the dialogs are in the same document.Variables in the scope chain of the calling dialog are not sharedwith the called subdialog: there is no sharing of variableinstances between execution contexts. Even when the subdialog isspecified in the same document as the calling dialog, itsexecution context contains different variable instances. When thesubdialog and calling dialog are in different documents but sharea root document, the subdialog's root variables are likewisedifferent instances. All variable bindings applied in thesubdialog context are lost on return to the calling context.

Within the subdialog context, however, normal scoping rulesfor grammars, events and variables apply. Active grammars in asubdialog include default grammars defined by the interpretercontext and appropriately scoped grammars in <link>,<menu> and <form> elements in the subdialog'sdocument and its root document. Event handling and variablebinding likewise follow the standard scoping hierarchy.

From a programming perspective, subdialogs behave differentlyfrom subroutines because the calling and called contexts areindependent. While a subroutine can access variable instances inits calling routine, a subdialog cannot access the same variableinstance defined in its calling dialog. Similarly, subdialogs donot follow the event percolation model in languages like Javawhere an event thrown in a method automatically percolates up tothe calling context if not handled in the called context. Eventsthrown in a subdialog are treated by event handlers definedwithin its context; they can only be passed to the callingcontext by a local event handler which explicitly returns theevent to the calling context (seeSection 5.3.10).

The subdialog is specified by the URI reference in the<subdialog>'s src or srcexpr attribute (see[RFC2396]). If thisURI reference contains an absolute or relative URI, which mayinclude a query string, then that URI is fetched and thesubdialog is found in the resulting document. If the<subdialog> has a namelist attribute, then those variablesare added to the query string of the URI.

If the URI reference contains only a fragment (i.e., noabsolute or relative URI), and if there is no namelist attribute,then there is no fetch: the subdialog is found in the currentdocument.

The URI reference's fragment, if any, specifies the subdialogto invoke. When there is no fragment, the subdialog invoked isthe lexically first dialog in the document.

If the URI reference is not valid (i.e. the dialog or documentdoes not exist), an error.badfetch must be thrown. Note that forerrors which occur during a dialog or document transition, thescope in which errors are handled is platform specific.

The attributes are:

Table 14: <subdialog>Attributes
name	The result returned from thesubdialog, an ECMAScript object whose properties are the onesdefined in the namelist attribute of the <return>element.
expr	The initial value of the form itemvariable; default is ECMAScript undefined. If initialized to avalue, then the form item will not be visited unless the formitem variable is cleared.
cond	An expression that must evaluate totrue after conversion to boolean in order for the form item to bevisited.
namelist	The list of variables to submit. Thedefault is to submit no variables. If a namelist is supplied, itmay contain individual variable references which are submittedwith the same qualification used in the namelist. DeclaredVoiceXML and ECMAScript variables can be referenced. If anundeclared variable is referenced in the namelist, then anerror.semantic is thrown (Section 5.1.1).
src	The URI of the subdialog.
srcexpr	An ECMAScript expression yielding theURI of the subdialog
method	SeeSection 5.3.8.
enctype	SeeSection 5.3.8.
fetchaudio	SeeSection 6.1. This defaults to the fetchaudioproperty.
fetchtimeout	SeeSection 6.1. This defaults to the fetchtimeoutproperty.
fetchhint	SeeSection 6.1. This defaults to thedocumentfetchhint property
maxage	SeeSection 6.1. This defaults to the documentmaxageproperty.
maxstale	SeeSection 6.1. This defaults to thedocumentmaxstale property.

Exactly one of "src" or "srcexpr" must be specified;otherwise, an error.badfetch event is thrown.

The <subdialog> element may contain elements common toall form items, and may also contain <param> elements.The <param> elements of a <subdialog> specify theparameters to pass to the subdialog. These parameters must bedeclared as <var> elements in the form executed asthe subdialog or an error.semantic will be thrown. When asubdialog initializes, the subdialog's form level<var> elements are initialized in document order tothe value specified by the <param> element with thecorresponding name. The parameter values are computed byevaluating the <param> expr attribute in the context ofthe <param> element. An expr attribute in the <var>element is ignored in this case. If no corresponding<param> is specified to <var> element, an exprattribute is used as a default value, or the variable isundefined if the expr attribute is unspecified as with theregular <form> element.

In the example below, the birthday of an individual is used tovalidate their driver's license. The src attribute of thesubdialog refers to a form that is within the same document. The<param> element is used to pass the birthday value to thesubdialog.

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><!-- form dialog that calls a subdialog --><form>  <subdialog name="result" src="#getdriverslicense">   <param name="birthday" expr="'2000-02-10'"/>   <filled>     <submit next="http://myservice.example.com/cgi-bin/process"/>   </filled>  </subdialog></form><!-- subdialog to get drivers license --><form>  <var name="birthday"/>  <field name="drivelicense">   <grammar src="http://grammarlib/drivegrammar.grxml"      type="application/srgs+xml"/>   <prompt> Please say your drivers license number. </prompt>   <filled>     <if cond="validdrivelicense(drivelicense,birthday)">       <var name="status" expr="true"/>     <else/>       <var name="status" expr="false"/>     </if>     <return namelist="drivelicense status"/>   </filled>  </field></form></vxml>

The driver's license value is returned to callingdialog, along with a status variable in order to indicate whetherthe license is valid or not.

This example also illustrates the convenience of using<param> as a means for forwarding data to the subdialog asa means of instantiating values in the subdialog without usingserver side scripting. An alternate solution that uses scripting,is shown below.

Document with form that calls a subdialog

<?xml version="1.0" encoding="UTF-8"?><vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd">  <form>   <field name="birthday">     <grammar type="application/srgs+xml" src="/grammars/date.grxml"/>     What is your birthday?   </field>   <subdialog name="result"          src="/cgi-bin/getlib#getdriverslicense"          namelist="birthday">     <filled>       <submit next="http://myservice.example.com/cgi-bin/process"/>     </filled>   </subdialog>  </form></vxml>

Document containing the subdialog(generated by /cgi-bin/getlib)

<?xml version="1.0" encoding="UTF-8"?><vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd">  <form>    <var name="birthday" expr="'1980-02-10'"/>    <!-- Generated by server script -->   <field name="drivelicense">     <grammar src="http://grammarlib/drivegrammar.grxml"        type="application/srgs+xml"/>     <prompt>       Please say your drivers license number.     </prompt>     <filled>       <if cond="validdrivelicense(drivelicense,birthday)">         <var name="status" expr="true"/>       <else/>         <var name="status" expr="false"/>       </if>       <return namelist="drivelicense status"/>     </filled>   </field>  </form></vxml>

In the above example, a server side script had to generate thedocument and embed the birthday value.

One last example is shown below that illustrates a subdialogto capture general credit card information. First the subdialogis defined in a separate document; it is intended to be reusableacross different applications. It returns a status, the creditcard number, and the expiry date; if a result cannot be obtained,the status is returned with value "no_result".

<?xml version="1.0" encoding="UTF-8"?><vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd">   <!-- Example of subdialog to collect credit card information. -->   <!-- file is at http://www.somedomain.example.com/ccn.vxml -->   <form>     <var name="status" expr="'no_result'"/>     <field name="creditcardnum">       <prompt>         What is your credit card number?       </prompt>       <help>         I am trying to collect your credit card information.         <reprompt/>       </help>       <nomatch>         <return namelist="status"/>       </nomatch>       <grammar src="ccn.grxml" type="application/srgs+xml"/>     </field>     <field name="expirydate">       <grammar type="application/srgs+xml" src="/grammars/date.grxml"/>       <prompt>         What is the expiry date of this card?       </prompt>       <help>         I am trying to collect the expiry date of the credit         card number you provided.         <reprompt/>       </help>       <nomatch>         <return namelist="status"/>       </nomatch>     </field>     <block>       <assign name="status" expr="'result'"/>       <return namelist="status creditcardnum expirydate"/>     </block>   </form></vxml>

An application that includes a calling dialog is shown below.It obtains the name of a software product and operating systemusing a mixed initiative dialog, and then solicits credit cardinformation using the subdialog.

<?xml version="1.0" encoding="UTF-8"?><vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd">  <!-- Example main program -->  <!-- http://www.somedomain.example.com/main.vxml -->  <!-- calls subdialog ccn.vxml -->  <!-- assume this gets defined by some dialog -->  <var name="username"/>  <form>    <var name="ccn"/>    <var name="exp"/>    <grammar src="buysoftware.grxml" type="application/srgs+xml"/>    <initial name="start">      <prompt>        Please tell us the software product you wish to buy        and the operating system on which it must run.      </prompt>      <noinput>        <assign name="start" expr="true"/>      </noinput>    </initial>    <field name="product">      <prompt>        Which software product would you like to buy?      </prompt>    </field>    <field name="operatingsystem">      <prompt>        Which operating system does this software need to run on?      </prompt>    </field>    <subdialog name="cc_results"        src="http://somedomain.example.com/ccn.vxml">      <filled>        <if cond="cc_results.status=='no_result'">          Sorry, your credit card information could not be          Obtained. This order is cancelled.          <exit/>        <else/>          <assign name="ccn" expr="cc_results.creditcardnum"/>          <assign name="exp" expr="cc_results.expirydate"/>        </if>      </filled>    </subdialog>    <block>      We will now process your order. Please hold.      <submit next="www.somedomain.example.com/process_order.asp"              namelist="username product operatingsystem ccn exp"/>    </block>  </form></vxml>

2.3.5. object element

A VoiceXML implementation platform may exposeplatform-specific functionality for use by a VoiceXML applicationvia the <object> element. The <object> element makesdirect use of its own content during initialization (e.g.<param> child element) and execution. As a result,<object> content cannot be treated as alternative content.Notice that like other input items, <object> has promptsand catch elements. It may also have <filled> actions.

For example, a platform-specific credit card collection objectcould be accessed like this:

<object    name="debit"    classid="method://credit-card/gather_and_debit"    data="http://www.recordings.example.com/prompts/credit/jesse.jar">  <param name="amount" expr="document.amt"/>  <param name="vendor" expr="vendor_num"/></object>

In this example, the <param> element (Section 6.4) is used to pass parameters to theobject when it is invoked. When this <object> is executed,it returns an ECMAScript object as the value of its form itemvariable. This <block> presents the values returned fromthe credit card object:

<block>   <prompt>     The card type is <value expr="debit.card"/>.   </prompt>   <prompt>     The card number is <value expr="debit.card_no"/>.   </prompt>   <prompt>     The expiration date is <value expr="debit.expiry_date"/>.   </prompt>   <prompt>     The approval code is <value expr="debit.approval_code"/>.   </prompt>   <prompt>The confirmation number is     <value expr="debit.conf_no"/>.   </prompt></block>

As another example, suppose that a platform has a feature thatallows the user to enter arbitrary text messages using atelephone keypad.

<?xml version="1.0" encoding="UTF-8"?><vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>  <object name="message"      classid="builtin://keypad-text-input">   <prompt>     Enter your message by pressing your keypad once     per letter.  For a space, enter star.  To end the     message, press the pound sign.   </prompt>  </object>  <block>    <assign name="document.pager_message" expr="message.text"/>    <goto next="#confirm_pager_message"/>  </block></form></vxml>

The user is first prompted for the pager message, then keys itin. The <block> copies the message to the variabledocument.pager_message.

Attributes of <object> include:

Table 15: <object>Attributes
name	When the object is evaluated, it setsthis variable to an ECMAScript value whose type is defined by theobject.
expr	The initial value of the form itemvariable; default is ECMAScript undefined. If initialized to avalue, then the form item will not be visited unless the formitem variable is cleared.
cond	An expression that must evaluate totrue after conversion to boolean in order for the form item to bevisited.
classid	The URI specifying the location ofthe object's implementation. The URI conventions areplatform-dependent.
codebase	The base path used to resolverelative URIs specified by classid, data, and archive. Itdefaults to the base URI of the current document.
codetype	The content type of data expectedwhen downloading the object specified by classid. When absent itdefaults to the value of the type attribute.
data	The URI specifying the location ofthe object's data. If it is a relative URI, it isinterpreted relative to the codebase attribute.
type	The content type of the dataspecified by the data attribute.
archive	A space-separated list of URIs forarchives containing resources relevant to the object, which mayinclude the resources specified by the classid and dataattributes. URIs which are relative are interpreted relative tothe codebase attribute.
fetchhint	SeeSection 6.1. This defaults to theobjectfetchhint property.
fetchtimeout	SeeSection 6.1. This defaults to the fetchtimeoutproperty.
maxage	SeeSection 6.1. This defaults to the objectmaxageproperty.
maxstale	SeeSection 6.1. This defaults to the objectmaxstaleproperty.

There is no requirement for implementations to provideplatform-specific objects, although implementations must handlethe <object> element by throwingerror.unsupported.objectname if the particular platform-specificobject is not supported (note that 'objectname' inerror.unsupported.objectname is a fixed string, so notsubstituted with the name of the unsupported object). Ifan implementation does this, it is considered to be supportingthe <object> element.

The object itself is responsible for determining whetherparameter names or values it receives are invalid. If so, the<object> element throws an error. The error may be eitherobject-specific or one of the standard errors listed inSection 5.2.6.

2.3.6. record element

The <record> element is an input item that collects arecording from the user. A reference to the recorded audio isstored in the input item variable, which can be played back(using the expr attribute on <audio>) or submitted to aserver, as shown in this example:

<?xml version="1.0" encoding="UTF-8"?><vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd">   <form>     <property name="bargein" value="true"/>     <block>       <prompt>         Riley is not available to take your call.       </prompt>     </block>     <record  name="msg" beep="true" maxtime="10s"       finalsilence="4000ms" dtmfterm="true" type="audio/x-wav">       <prompt timeout="5s">         Record a message after the beep.       </prompt>       <noinput>         I didn't hear anything, please try again.       </noinput>     </record>     <field name="confirm">       <grammar type="application/srgs+xml" src="/grammars/boolean.grxml"/>       <prompt>          Your message is <audio expr="msg"/>.       </prompt>       <prompt>         To keep it, say yes.  To discard it, say no.       </prompt>       <filled>         <if cond="confirm">           <submit next="save_message.pl" enctype="multipart/form-data"               method="post" namelist="msg"/>         </if>         <clear/>       </filled>     </field>   </form></vxml>

The user is prompted to record a message, and then records it.The recording terminates when one of the following conditions ismet: the interval of final silence occurs, a DTMF key is pressed,the maximum recording time is exceeded, or the caller hangs up.The recording is played back, and if the user approves it, issent on to the server for storage using the HTTP POST method.Notice that like other input items, <record> has grammar,prompt and catch elements. It may also have <filled>actions.

Timing diagram showing an example of prompting a user for input,then recording the user's voice.
Figure 7: Timing of prompts, audio recording, and DTMF input

When a user hangs up during recording, the recordingterminates and a connection.disconnect.hangup event is thrown.However, audio recorded up until the hangup is available throughthe <record> variable. Applications, such as simplevoicemail services, can then return audio data to a server evenafter disconnection:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>   <record  name="msg" beep="true" maxtime="10s"       finalsilence="4000ms" dtmfterm="true" type="audio/x-wav">       <prompt timeout="5s">         Record a message after the beep.       </prompt>       <noinput>         I didn't hear anything, please try again.       </noinput>       <catch event="connection.disconnect.hangup">           <submit next="./voicemail_server.asp"/>       </catch>   </record></form></vxml>

Arecording begins at the earliest after the playbackof any prompts (including the 'beep' tone if defined). As anoptimization, a platform may begin recording when the user startsspeaking.

A timeout interval is defined to begin immediately afterprompt playback (including the 'beep' tone if defined) and itsduration is determined by the 'timeout' property. If the timeoutinterval is exceeded before recording begins, then a<noinput> event is thrown.

A maxtime interval is defined to begin when recording beginsand its duration is determined by a 'maxtime' attribute. If themaxtime interval is exceeded before recording ends, then therecording is terminated and the maxtime shadow variable is set to'true'.

Arecording ends when an event is thrown, DTMF orspeech input matches an active grammar, or the maxtime intervalis exceeded. As an optimization, a platform may end recordingafter a silence interval (set by the 'finalsilence' attribute)indicating the user has stopped speaking.

If no audio is collected during execution of <record>,then the record variable remains unfilled (note). This can occur,for example, when DTMF or speech input is received during promptplayback or before the timeout interval expires. In particular,if no audio is collected before the user terminates recordingwith DTMF input matching a local DTMF grammar (or when thedtmfterm attribute is set to true), then the record variable isnot filled (so shadow variables are not set), and the FIA appliesas normal without a noinput event being thrown. However,information about the input may be available in these situationsvia application.lastresult$as described inSection5.1.5.

The <record> element contains a 'dtmfterm' attribute asa developer convenience. A 'dtmfterm' attribute with the value'true' is equivalent to the definition of a local DTMF grammarwhich matches any DTMF input. The dtmfterm attribute haspriority over specified local DTMF grammars.

Any DTMF keypress matching an active grammar terminatesrecording. DTMF keypresses not matching an active grammar areignored (and therefore do not terminate or otherwise affectrecording) and may optionally be removed from the signal by theplatform.

Platform support for recognition of speech grammars duringrecording is optional. If the platform supports simultaneousrecognition and recording, then spoken input matching anactive non-local speech grammar terminates recording and theFIA is invoked, transferring execution to the elementcontaining the grammar. The 'terminating' speech input isaccessible via application.lastresult$. The audio of therecognized 'terminating' speech input is not available and isnot part of the recording. Note that, unlike DTMF, speechrecognition input cannot be used just to terminate recording:if local speech grammars are specified, they are treated asinactive (i.e. they are ignored), even if the platform supportssimultaneous recognition and recording.

If the termination grammar matched is a local grammar, therecording is placed in the record variable. Otherwise, the recordvariable is left unfilled (note) and the form interpretation algorithm isinvoked. In each case, application.lastresult$ is assigned.

note Although the record variable is notfilled with a recording in this case, a match of a non-localgrammar may nevertheless result in an assignment of some value tothe record variable (seeSection3.1.6).

The attributes of <record> are:

Table 16: <record>Attributes
name	The input item variable that will hold the recording. Note that how this variable is implemented may vary betweenplatforms (although all platforms must support its behaviour in<audio> and <submit> as described in thisspecification).
expr	The initial value of the form itemvariable; default is ECMAScript undefined. If initialized to avalue, then the form item will not be visited unless the formitem variable is cleared.
cond	An expression that must evaluate totrue after conversion to boolean in order for the form item to bevisited.
modal	If this is true (the default) allnon-local speech and DTMF grammars are not active while makingthe recording. If this is false, non-local speech and DTMFgrammars are active.
beep	If true, a tone is emitted just priorto recording. Defaults to false.
maxtime	The maximum duration to record.The value is a Time Designation (seeSection 6.5). Defaults to aplatform-specific value.
finalsilence	The interval of silence thatindicates end of speech. The value is a Time Designation(seeSection 6.5).Defaults to a platform-specific value.
dtmfterm	If true, any DTMF keypress notmatched by an active grammar will be treated as a match of anactive (anonymous) local DTMF grammar. Defaults to true.
type	The media format of the resultingrecording. Platforms must support the audio file formatsspecified inAppendixE (other formats may also be supported). Defaults to aplatform-specific format which should be one of the requiredformats.

The <record> element has the following shadow variablesset after the recording has been made:

Table 17: <record> ShadowVariables
name$.duration	The duration of the recording inmilliseconds.
name$.size	The size of the recording inbytes.
name$.termchar	If the dtmfterm attribute is true,and the user terminates the recording by pressing a DTMF key,then this shadow variable is the key pressed (e.g. "#").Otherwise it is undefined.
name$.maxtime	Boolean, true if the recording wasterminated because the maxtime duration was reached.

2.3.7. transfer element

The <transfer> element directs the interpreter toconnect the caller to another entity (e.g. telephone line oranother voice application). During the transfer operation, thecurrent interpreter session is suspended.

There are a variety of ways an implementation platform caninitiate a transfer, including "bridge", "blind", network-basedredirect (sometimes referred to as "take back and transfer"),"switchhook transfer", etc. Bridge and blind transfer types aresupported; the others are highly dependent upon specific platformand network features and configuration and therefore are outsidethe scope of this specification.

The <transfer> element is optional, though platformsshould support it. Platforms that support <transfer> maysupport bridge or blind transfer types, or both. Platforms thatsupport either type of transfer may optionally support bargeininput modes of DTMF, speech recognition, or both, during the calltransfer to drop the far-end. Blind transfer attempts can onlybe cancelled up to the point the outgoing call begins.

Attributes are:

Table 18: <transfer>Attributes
name	Stores the outcome of a bridgetransfer attempt. In the case of a blind transfer, this variableis undefined.
expr	The initial value of the form itemvariable; default is ECMAScript undefined. If initialized to avalue, then the form item will not be visited unless the formitem variable is cleared.
cond	An expression that must evaluate totrue in order for the form item to be visited.
dest	The URI of the destination(telephone, IP telephony address). Platforms must support thetel: URL syntax described in[RFC2806] and may support other URI-basedaddressing schemes.
destexpr	An ECMAScript expression yielding theURI of the destination.
bridge	Determines whether the platform remains in the connection withthe caller and callee. bridge="true" Bridge transfer. The platform adds the callee to theconnection. Document interpretation suspends until thetransferred call terminates. The platform remains in theconnection for the duration of the transferred call; listeningduring transfer is controlled by any included<grammar>s. If the caller disconnects by going onhook or if the networkdisconnects the caller, the platform throws aconnection.disconnect.hangup event. If the connection is released for any other reason, thatoutcome is reported in the name attribute (see the followingtable). bridge="false" Blind transfer (default). The platform redirects thecaller to the callee without remaining in the connection, anddoes not monitor the outcome. The platform throws a connection.disconnect.transferimmediately, regardless of whether the transfer was successful ornot.
connecttimeout	The time to wait while trying toconnect the call before returning the noanswer condition.The value is a Time Designation (seeSection 6.5). Only applies if bridge istrue. Default is platform specific.
maxtime	The time that the callis allowed to last, or 0s if no limit is imposed. The value isa Time Designation (seeSection6.5). Only applies if bridge is true. Default is 0s.
transferaudio	The URI of audio source to play while the transfer attempt isin progress (before far-end answer). If the resource cannot be fetched, the error is ignored andthe transfer continues; what the caller hears isplatform-dependent.
aai	Application-to-application information. A string containingdata sent to an application on the far-end, available in thesession variable session.connection.aai. The transmission of aai data may depend upon signaling networkgateways and data translation (e.g. ISDN to SIP); the status ofdata sent to a remote site is not known or reported. Although all platforms must support the aai attribute,platforms are not required to send aai data and need not supportreceipt of aai data. Platforms that cannot receive aai data mustset the session.connection.aai variable to the ECMAScriptundefined value. The underlying transmission mechanism may imposedata length limits.
aaiexpr	An ECMAScript expression yielding the AAI data.

Exactly one of "dest" or "destexpr" may be specified;otherwise, an error.badfetch event is thrown. Likewise, exactlyone of "aai" or "aaiexpr" may be specified; otherwise, anerror.badfetch event is thrown.

2.3.7.1 Blind Transfer

With a blind transfer, an attempt is made to connect theoriginal caller with the callee. Any prompts preceding the<transfer>, as well as prompts within the <transfer>,are queued and played before the transfer attempt begins; bargeinproperties apply as normal.

The VoiceXML implementation platform is not part of the audio connection between the caller and callee after a blind transfer.
Figure 8: Audio Connections during a blind transfer:<transfer bridge="false">

Any audio source specified by the transferaudio attribute isignored since no audio can be played from the platform to thecaller during the transfer attempt. Whether the connection issuccessful or not, the implementation platform cannot regaincontrol of the connections.

Connection status is not available. For example, it is notpossible to know whether the callee was busy, when a successfulcall ends, etc. However, some error conditions may be reported ifknown to the platform, such as if the caller is not authorized tocall the destination, or if the destination URI is malformed.These are platform-specific, but should follow the namingconvention of other transfer form item variable values.

The caller can cancel the transfer attempt before the outgoingcall begins by barging in with a speech or DTMF command thatmatches an active grammar during the playback of any queuedaudio.

In this case, the form item variable is set, and the followingshadow variables are set:

Table 19: <transfer> ShadowVariables
name$.duration	The duration of a call transfer in seconds. The duration is 0 if a call attempt was terminated by thecaller (using a voice or DTMF command) before the outgoing callbegins.
name$.inputmode	The input mode of the terminating command (dtmf or voice), orundefined if the transfer was not terminated by a grammarmatch.
name$.utterance	The utterance text used if transferwas terminated by speech recognition input or the DTMF result ifthe transfer was terminated by DTMF input; otherwise it isundefined.

Also, the application.lastresult$ variable will be filled asdescribed inSection5.1.5.

If the caller disconnects by hanging up during a call transferattempt before the connection to the callee begins, aconnection.disconnect.hangup event will be thrown, and dialogexecution will transition to a handler for the hangup event (ifone exists). The form item variable, and thus shadow variables,will not be set.

Once the transfer begins and the interpreter disconnects fromthe session, the platform throws connection.disconnect.transferand document interpretation continues normally.

Any connection between the caller and callee remains in placeregardless of document execution.

Table 20: Blind TransferOutcomes
Action	Value of form item variable	Event or Error	Reason
transfer begins	undefined	connection.disconnect.transfer	An attempt has been madeto transfer the caller to another line and will not return.
caller cancels transferbefore outgoing call begins	near_end_disconnect		The caller cancelled thetransfer attempt via a DTMF or voice command before the outgoingcall begins (during playback of queued audio).
transfer ends	unknown		The transfer ended butthe reason is not known.

2.3.7.2 Bridge Transfer

For a bridge transfer, the platform connects the caller to thecallee in a full duplex conversation.

VoiceXML implementation platform (party B) involved in a bridge transfer between a callerand callee.
Figure 9: Audio Connections during a bridge transfer:<transfer bridge="true">

Any prompts preceding the <transfer>, as well asprompts within the <transfer>, are queued and played beforethe transfer attempt begins. The bargein control appliesnormally. Specification of bargeintype is ignored; "hotword" isset by default.

The caller can cancel the transfer attempt before the outgoingcall begins by barging in with a speech or DTMF command thatmatches an active grammar during the playback of any queuedaudio.

2.3.7.2.1 Listening for user input during atransfer

Platforms may optionally support listening for caller commandsto terminate the transfer by specifying one or more grammarsinside the <transfer> element. The <transfer>element is modal in that no grammar defined outside its scope isactive. The platform will monitor during playing of prompts andduring the entire length of the transfer connecting and talkingphases:

DTMF input from the caller matching an included DTMFgrammar
an utterance from the caller matching an included speechgrammar

A successful match will terminate the transfer (the connectionto the callee); document interpretation continues normally.An unsuccessful match is ignored. If no grammars are specified,the platform will not listen to input from the caller.

The platform does not monitor in-band signals or voice inputfrom the callee.

2.3.7.2.2 Handling caller, callee, or networkdisconnections

While attempting to connect to the callee, the platformmonitors call progress indicators (in-band and/or out-of-band,depending upon the particular connection type and protocols). Forthe duration of a successful transfer, the platform monitors for(out-of-band) telephony events, such as disconnect, on both calllegs.

If the callee disconnects, the caller resumes his session withthe interpreter. If the caller disconnects, the platformdisconnects the callee, and document interpretation continuesnormally. If both the caller and callee are disconnected by thenetwork, document interpretation continues normally.

The possible outcomes for a bridge transfer before theconnection to the callee is established are:

Table 21: Bridged Transfer Outcomes Priorto Connection Being Established
Action	Value of form item variable	Event	Reason
caller disconnects		connection.disconnect.hangup	The caller hung up.
caller disconnectscallee	near_end_disconnect		The caller forced thecallee to disconnect via a DTMF or voice command.
callee busy	busy		The callee was busy.
network busy	network_busy		An intermediate networkrefused the call.
callee does notanswer	noanswer		There was no answerwithin the time specified by the connecttimeout attribute.
---	unknown		The transfer ended butthe reason is not known.

The possible outcomes for a bridge transfer after theconnection to the callee is established are:

Table 22: Bridged Transfer Outcomes AfterConnection Established
Action	Value of form item variable	Event	Reason
caller disconnects		connection.disconnect.hangup	The caller hung up.
caller disconnects	near_end_disconnect		The caller forced thecallee to disconnect via a DTMF or voice command.
platform disconnectscallee	maxtime_disconnect		The callee was disconnected by the platform because the callduration reached the value of maxtime attribute.
network disconnectscallee	network_disconnect		The network disconnected the callee from the platform.
callee disconnects	far_end_disconnect		The callee hung up.
---	unknown		The transfer ended butthe reason is not known.

If the caller disconnects by hanging up (either during a calltransfer or call transfer attempt), the connection to the callee(if one exists) is dropped, a connection.disconnect.hangup eventwill be thrown, and dialog execution will transition to a handlerfor the hangup event (if one exists). The form item variable, andthus shadow variables, will not be set.

If execution of <transfer> continues normally, then itsform item variable is set, and the following shadow variableswill be set:

Table 23: <transfer> ShadowVariables
name$.duration	The duration of a call transfer in seconds. The duration is 0 if a call attempt was terminated by thecaller (using a voice or DTMF command) prior to beinganswered.
name$.inputmode	The input mode of the terminatingcommand (dtmf or voice) or undefined if the transfer was notterminated by a grammar match.
name$.utterance	The utterance text used if transferwas terminated by speech recognition input or the DTMF resultif the transfer was terminated by DTMF input; otherwise it isundefined.

If the transfer was terminated by speech recognition input,then application.lastresult$ is assigned as usual.

2.3.7.2.3 Audio during bridge transferattempt

During a bridge transfer, it might be desirable to play audioto the caller while the platform attempts to connect to thecallee. For example, an advertisement ("Buy Joe's Spicy ShrimpSauce") or informational message ("Your call is very important tous; please wait while we connect you to the next availableagent.") might be provided in place of call progress information(ringing, busy, network announcements, etc.).

At the point the outgoing call begins, audio specifiedby transferaudio begins playing. Playing of transferaudioterminates when the answer status of the far-end connection isdetermined. This status isn't always known, since the far-endswitch can play audio (such as a special information tone, busytone, network busy tone, or a recording saying the connectioncan't be made) with out actually "answering" the call.

If a specified audio file play duration is shorter than thetime it takes to connect the far-end, the caller may hearsilence, platform-specific audio, or call progress information,depending upon the platform.

2.3.7.3 Transfer Errors and Events

One of the following events may be thrown during atransfer:

Table 24: Events Thrown DuringTransfer
Event	Reason	Transfer type
connection.disconnect.hangup	The caller hung up.	bridge
connection.disconnect.transfer	An attempt has been madeto transfer the caller to another line and will not return.	blind

If a transfer attempt could not be made, one of the followingerrors will be thrown:

Table 25: Transfer Attempt ErrorEvents
Error	Reason	Transfer type
error.connection.noauthorization	The caller is not allowed to call thedestination.	blind and bridge
error.connection.baddestination	The destination URI ismalformed.	blind and bridge
error.connection.noroute	The platform is not ableto place a call to the destination.	bridge
error.connection.noresource	The platform cannotallocate resources to place the call.	bridge
error.connection.protocol.nnn	The protocol stack forthis connection raised an exception that does not correspond toone of the other error.connection events.	bridge
error.unsupported.transfer.blind	The platform does notsupport blind transfer.	blind
error.unsupported.transfer.bridge	The platform does notsupport bridge transfer.	bridge
error.unsupported.uri	The platform does notsupport the URI format used. The special variable _message (Section 5.2.2) will contain thestring "The URIx is not a supported URI format" wherex is the URI from the dest or destexpr <transfer>attributes.	blind and bridge

2.3.7.4 Example

The following example attempts to perform a bridge transferthe caller to a another party, and wait for that conversation toterminate. Prompts may be included before or within the<transfer> element. This may be used to inform the callerof what is happening, with a notice such as "Please wait while wetransfer your call." The <prompt> within the <block>,and the <prompt> within <transfer> are queued andplayed before actually performing the transfer. After the audioqueue is flushed, the outgoing call is initiated. By default, thecaller is connected to the outgoing telephony channel. The"transferaudio" attribute specifies an audio file to be played tothe caller in place of audio from the far-end until the far-endanswers. If the audio source is longer than the connect time, theaudio will stop playing immediately upon far-end answer.

Sequence and timing diagram during a bridge transfer.
Figure 10: Sequence and timing during an example of a bridgetransfer

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>   <var name="mydur" expr="0"/>   <block>     <!-- queued and played before starting the transfer -->     <prompt>        Calling Riley. Please wait.     </prompt>   </block>   <!-- Play music while attempting to connect to far-end -->   <!-- "hotword" bargeintype during transferaudio only -->   <!-- Wait up to 60 seconds for the far end to answer  -->   <transfer name="mycall" dest="tel:+1-555-123-4567"      transferaudio="music.wav" connecttimeout="60s" bridge="true">     <!-- queued and played before starting the transfer -->     <!-- bargein properties apply during this prompt -->     <prompt>        Say cancel to disconnect this call at any time.     </prompt>     <!-- specify an external grammar to listen for "cancel" command -->     <grammar src="cancel.grxml" type="application/srgs+xml"/>     <filled>       <assign name="mydur" expr="mycall$.duration"/>         <if cond="mycall == 'busy'">           <prompt>             Riley's line is busy. Please call again later.           </prompt>         <elseif cond="mycall == 'noanswer'"/>           <prompt>             Riley can't answer the phone now. Please call             again later.           </prompt>         </if>     </filled>   </transfer>   <!-- submit call statistics to server -->   <block>      <submit namelist="mycall mydur" next="/cgi-bin/report"/>   </block></form></vxml>

2.4Filled

The <filled> element specifies an action to perform whensome combination of input items are filled. It may occur in twoplaces: as a child of the <form> element, or as a child ofan input item.

As a child of a <form> element, the <filled>element can be used to perform actions that occur when acombination of one or more input items is filled. For example,the following <filled> element does a cross-check to ensurethat a starting city field differs from the ending cityfield:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>  <field name="start_city">      <grammar src="http://www.grammars.example.com/voicexml/city.grxml"         type="application/srgs+xml"/>      <prompt>What is the starting city?</prompt>  </field>  <field name="end_city">      <grammar src="http://www.grammars.example.com/voicexml/city.grxml"         type="application/srgs+xml"/>      <prompt>What is the ending city?</prompt>  </field>  <filled mode="all" namelist="start_city end_city">    <if cond="start_city == end_city">      <prompt>        You can't fly from and to the same city.      </prompt>      <clear/>    </if>  </filled></form></vxml>

If the <filled> element appears inside an input item, itspecifies an action to perform after that input item is filledin:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>  <field name="city">    <grammar type="application/srgs+xml"      src="http://www.ship-it.example.com/grammars/served_cities.grxml"/>    <prompt>What is the city?</prompt>    <filled>      <if cond="city == 'Novosibirsk'">        <prompt>          Note, Novosibirsk service ends next year.        </prompt>      </if>    </filled>  </field></form></vxml>

After each gathering of the user's input, all the inputitems mentioned in the input are set, and then the interpreterlooks at each <filled> element in document order (nopreference is given to ones in input items vs. ones in the form).Those whose conditions are matched by the utterance are thenexecuted in order, until there are no more, or until onetransfers control or throws an event.

Attributes include:

Table 26: <filled>Attributes
mode	Either all (the default), or any. Ifany, this action is executed when any of the specified inputitems is filled by the last user input. If all, this action isexecuted when all of the mentioned input items are filled, and atleast one has been filled by the last user input. A<filled> element in an input item cannot specify a mode;if a mode is specified, then an error.badfetch is thrown bythe platform upon encountering the document.
namelist	The input items to trigger on. For a<filled> in a form, namelist defaults to the names(explicit and implicit) of the form's input items. A<filled> element in an input item cannot specify a namelist(the namelist in this case is the input item name); if anamelist is specified, then an error.badfetch is thrown by theplatform upon encountering the document. Note that controlitems are not permitted in this list; an error.badfetch isthrown when the document contains a <filled> element with anamelist attribute referencing a control item variable.

2.5Links

A <link> element may have one or more grammars which arescoped to the element containing the <link>. A"scope" attribute on the element containing the <link> hasno effect on the scope of the <link> grammars (for example,when a <link> is contained in a <form> withscope="document", the <link> grammars are scoped to theform, not to the document). Grammar elements contained inthe <link> are not permitted to specify scope (seeSection 3.1.3 for details).When one of these grammars is matched, the link activates, andeither:

Transitions to a new document or dialog (like <goto>),or
Throws an event (like <throw>).

For instance, this link activates when you say "books" orpress "2".

<link next="http://www.voicexml.org/books/main.vxml">  <grammar mode="voice" version="1.0" root="root">    <rule scope="public">       <one-of>         <item>books</item>          <item>VoiceXML books</item>        </one-of>    </rule>  </grammar>  <grammar mode="dtmf" version="1.0" root="r2">     <rule scope="public"> 2 </rule>  </grammar></link>

This link takes you to a dynamically determined dialog in thecurrent document:

<link expr="'#' + document.helpstate">  <grammar mode="voice" version="1.0" root="root">     <rule scope="public"> help </rule>  </grammar></link>

The <link> element can be a child of <vxml>,<form>, or of the form items <field> and<initial>. A link at the <vxml> level has grammarsthat are active throughout the document. A link at the<form> level has grammars active while the user is in thatform. If an application root document has a document-level link,its grammars are active no matter what document of theapplication is being executed.

If execution is in a modal form item, then link grammars atthe form, document or application level are not active.

You can also define a link that, when matched, throws an eventinstead of going to a new document. This event is thrown at thecurrent location in the execution, not at the location where thelink is specified. For example, if the user matches thislink's grammar or enters '2' on the keypad, a help event isthrown in the form item the user was visiting and is handled bythe best qualified <catch> in the item's scope (seeSection 5.2.4 for furtherdetails):

<link dtmf="2" event="help">  <grammar mode="voice" version="1.0" root="r5">    <rule scope="public">       <one-of>         <item>arrgh</item>          <item>alas all is lost</item>          <item>fie ye froward machine</item>          <item>I don't get it</item>        </one-of>    </rule>  </grammar></link>

When a link is matched, application.lastresult$ is assigned.This allows callflow decisions to be made downstream based on theactual semantic result. An example appears inSection 5.1.5.

Conceptually the link element can be thought of as having twoparts: condition and action. The "condition" is the content ofthe link element, i.e. the grammar(s) that must be matched inorder for the link to be activated. The "action" is specified bythe attributes of the element, i.e. where to transition or whichevent to throw. The "condition" is resolved/evaluated lexically,while the "action" is resolved/evaluated dynamically.Specifically this means that

any URIs in the content of the link are resolved lexically,i.e. according to the base URI (see xml:base inSection 1.5.1) for the documentin which the link is defined.
any URIs in an attribute of the link element are resolveddynamically, i.e. according to the base URI in effect when thelink's grammar is matched.
any ECMAScript expressions in an attribute of the linkelement are evaluated dynamically, i.e. in the scope andexecution context in effect when the grammar is matched.

Attributes of <link> are:

Table 27: <link>Attributes
next	The URI to go to. This URI is adocument (perhaps with an anchor to specify the starting dialog),or a dialog in the current document (just a bare anchor).
expr	Like next, except that the URI isdynamically determined by evaluating the given ECMAScriptexpression.
event	The event to throw when the usermatches one of the link grammars.
eventexpr	An ECMAScript expression evaluatingto the name of the event to throw when the user matches one ofthe link grammars.
message	A message string providing additionalcontext about the event being thrown. The message is available asthe value of a variable within the scope of the catch element,seeSection 5.2.2.
messageexpr	An ECMAScript expression evaluatingto the message string.
dtmf	The DTMF sequence for this link. Itis equivalent to a simple DTMF <grammar> and DTMFproperties (Section 6.3.3)apply to recognition of the sequence. Unlike DTMF grammars,whitespace is optional: dtmf="123#" is equivalent to dtmf="1 2 3#". The attribute can be used at the same time as other<grammar>s: the link is activated when user input matches alink grammar or the DTMF sequence.
fetchaudio	SeeSection 6.1. This defaults to the fetchaudioproperty.
fetchhint	SeeSection 6.1. This defaults to thedocumentfetchhint property.
fetchtimeout	SeeSection 6.1. This defaults to the fetchtimeoutproperty.
maxage	SeeSection 6.1. This defaults to the documentmaxageproperty.
maxstale	SeeSection 6.1. This defaults to thedocumentmaxstale property.

3. UserInput

3.1Grammars

3.1.1 Speech Grammars

The <grammar> element is used to provide a speechgrammar that

specifies a set of utterances that a user may speak to performan action or supply information, and
for a matching utterance, returns a corresponding semanticinterpretation. This may be a simple value (such as a string), aflat set of attribute-value pairs (such as day, month, and year),or a nested object (for a complex request).

The <grammar> element is designed to accommodate anygrammar format that meets these two requirements. VoiceXMLplatforms must support at least one common format, the XML Formof the W3C Speech Recognition Grammar Specification[SRGS]. VoiceXML platformsshould support the Augmented BNF (ABNF) Form of theW3C Speech Recognition Grammar Specification[SRGS]. VoiceXML platforms may choose tosupport grammar formats other than SRGS. For instance, a platformmight use the <grammar> element's support for PCDATA toinline a proprietary grammar definition or use the "src" and"type" attributes for an external one.

VoiceXML platforms must be a Conforming XML Form GrammarProcessor as defined in the W3C Speech Recognition GrammarSpecification[SRGS]. Whilethis requires a platform to process documents with one or more"xml:lang" attributes defined, it does not require that theplatform must be multi-lingual. When an unsupported language isencountered, the platform throws anerror.unsupported.language event which specifies theunsupported language in its message variable.

Elements of XML Form of SRGS

The following elements are defined in the XML Form of the W3CSpeech Recognition Grammar Specification[SRGS] and are available in VoiceXML 2.0. Thisdocument does not redefine these elements. Refer to the W3CSpeech Recognition Grammar Specification[SRGS] for definitions and examples.

Table 28: SRGS (XML Form)Elements
Element	Purpose	Section (in[SRGS])
<grammar>	Root element of an XML grammar	4.
<meta>	Header declaration of meta content ofan HTTP equivalent	4.11.1
<metadata>	Header declaration of XML metadatacontent	4.11.2
<lexicon>	Header declaration of a pronunciationlexicon	4.10
<rule>	Declare a named rule expansion of agrammar	3.
<token>	Define a word or other entity thatmay serve as input	2.1
<ruleref>	Refer to a rule defined locally orexternally	2.2
<item>	Define an expansion with optionalrepeating and probability	2.3
<one-of>	Define a set of alternative ruleexpansions	2.4
<example>	Element contained within a ruledefinition that provides an example of input that matches therule	3.3
<tag>	Define an arbitrary string that to beincluded inline in an expansion which may be used for semanticinterpretation	2.6

3.1.1.1 Inline Grammars

The <grammar> element may be used to specify aninline grammar or anexternal grammar. Aninline grammar is specified by the content of a <grammar>element and defines an entire grammar:

<grammar type="media-type" mode="voice">inline speech grammar</grammar>

It may be necessary in this case to enclose the content in aCDATA section[XML]. Forinline grammars the type parameter specifies a media type thatgoverns the interpretation of the content of the <grammar>element.

The following is an example of inline grammar defined by theXML Form of the W3C Speech Recognition Grammar Specification[SRGS].

<grammar mode="voice" xml:lang="en-US" version="1.0" root="command">  <!-- Command is an action on an object -->  <!-- e.g. "open a window" -->  <rule scope="public">    <ruleref uri="#action"/> <ruleref uri="#object"/>  </rule>  <rule>    <one-of>      <item> open </item>      <item> close </item>      <item> delete </item>      <item> move </item>    </one-of>  </rule>  <rule>   <item repeat="0-1">      <one-of> <item> the </item> <item> a </item> </one-of>    </item>    <one-of>      <item> window </item>      <item> file </item>      <item> menu </item>    </one-of>  </rule></grammar>

The following is the equivalent example of the inline grammardefined by the ABNF Form of the W3C Speech Recognition GrammarSpecification[SRGS].Because VoiceXML platforms are not required to support thisformat it may be less portable.

<grammar mode="voice" type="application/srgs">#ABNF 1.0;language en-US;mode voice;root $command;  public $command = $action $object;  $action = open | close | delete | move;  $object = [the | a] (window | file | menu);</grammar>

3.1.1.2 External Grammars

An external grammar is specified by an element of the form

<grammar src="URI" type="media-type"/>

The media type is optional in this case because theinterpreter context will attempt to determine the typedynamically as described inSection 3.1.1.4.

If the src attribute is defined and there is an inline grammaras content of a grammar element then an error.badfetch event isthrown.

The following is an example of a reference to an externalgrammar written in the XML Form of the W3C Speech RecognitionGrammar Specification[SRGS].

<grammar type="application/srgs+xml" src="http://www.grammar.example.com/date.grxml"/>

The following example is the equivalent grammar reference fora grammar that is authored using the ABNF Form of the W3C SpeechRecognition Grammar Specification[SRGS].

<grammar type="application/srgs" src="http://www.grammar.example.com/date.gram"/>

3.1.1.3 Grammar Weight

A weight for the grammar can be specified by the weightattribute:

<grammar weight="0.6" src="form.grxml" type="application/srgs+xml"/>

Grammar elements, including those in link, field and formelements, can have a weight attribute. The grammar can be inline,external or built-in.

Weights follow the definition of weights on alternatives inthe W3C Speech Recognition Grammar Specification[SRGS §2.4.1]. A weight isa simple positive floating point values without exponentials.Legal formats are "n", "n.", ".n" and "n.n" where "n" is asequence of one or many digits.

A weight is nominally a multiplying factor in the likelihooddomain of a speech recognition search. A weight of "1.0" isequivalent to providing no weight at all. A weight greater than"1.0" positively biases the grammar and a weight less than "1.0"negatively biases the grammar. If unspecified, the default weightfor any grammar is "1.0". If no weight is specified for anygrammar element then all grammars are equally likely.

<link event="help">  <grammar weight="0.5" mode="voice" version="1.0" root="help">   <rule scope="public">      <item repeat="0-1">Please</item> help   </rule>    </grammar></link><form>  <grammar src="form.grxml" type="application/srgs+xml"/>  <field name="expireDate">     <grammar weight="1.2" src="http://www.example.org/grammar/date"/>  </field></form>

In the example above, the semantics of weights is equivalentto the following XML grammar.

<grammar root="r1" type="application/srgs+xml"><rule>  <one-of>    <item weight="0.5"> <ruleref uri="#help"/> </item>    <item weight="1.0"> <ruleref uri="form.grxml"/> </item>    <item weight="1.2"> <ruleref uri="http://www.example.org/grammar/date"/></item>  </one-of></rule><rule>  <item repeat="0-1">Please</item> help</rule></grammar>

Implicit grammars, such as those in options, do not supportweights - use the <grammar> element instead for controlover grammar weight.

Grammar weights only affect grammar processing. They do notdirectly affect the post processing of grammar results, includinggrammar precedence when user input matches multiple activegrammar (seeSection3.1.4).

A weight has no effect on DTMF grammars (SeeSection 3.1.2). Any weightattribute specified in a grammar element whose mode attribute isdtmf is ignored.

<!-- weight will be ignored --><grammar mode="dtmf" weight="0.3" src="http://www.example.org/dtmf/number"/>

Appropriate weights are difficult to determine, and guessingweights does not always improve recognition performance.Effective weights are usually obtained by study of real speechand textual data on a particular platform. Furthermore, a grammarweight is platform specific. Note that different ASR engines maytreat the same weight value differently. Therefore, the weightvalue that works well on particular platform may generatedifferent results on other platforms.

3.1.1.4 Grammar Element

Attributes of <grammar> inherited from the W3C SpeechRecognition Grammar Specification[SRGS] are:

Table 29: <grammar> AttributesInherited from SRGS
version	Defines the version of thegrammar.
xml:lang	Thelanguage identifier of the grammar (forexample, "fr-CA" for Canadian French.) If omitted, the value isinherited down from the document hierarchy.
mode	Defines the mode of the grammarfollowing the modes of the W3C Speech Recognition GrammarSpecification[SRGS].
root	Defines the rule which acts as theroot rule of the grammar.
tag-format	Defines the tag content format forall tags within the grammar.
xml:base	Declares the base URI from whichrelative URIs in the grammar are resolved. This base declaration has precedence over the <vxml>base URI declaration. If a local declaration is omitted, thevalue is inherited down the document hierarchy.

The use and interpretation of these attributes is determinedas follows:

Inline XML Form of SRGS: determined by W3C Speech RecognitionGrammar Specification which states that the version attribute isrequired and must have the value is "1.0"; that the rootattribute is required and its value identifies which rule toactivate; and other attributes are optional (see[SRGS] for furtherdetails).
Inline ABNF Form of SRGS: any specified attributes must beignored by the platform
External XML and ABNF Forms of SRGS: any specified attributesmust be ignored by the platform
all other grammar types: the use and interpretation of anyspecified attributes is platform-dependent

Attributes of <grammar> added by VoiceXML 2.0 are:

Table 30: <grammar> AttributesAdded in VoiceXML

src

The URI specifying the location ofthe grammar and optionally a rulename within that grammar, if itis external. The URI is interpreted as a rule reference asdefined in Section 2.2 of the Speech Recognition GrammarSpecification[SRGS] but notall forms of rule reference are permitted from within VoiceXML.The rule reference capabilities are described in detail belowthis table.

scope

Either "document", which makes thegrammar active in all dialogs of the current document (andrelevant application leaf documents), or "dialog", to make thegrammar active throughout the current form. If omitted, thegrammar scoping is resolved by looking at the parent element. SeeSection 3.1.3 for details onscoping including precedence behavior.

type

The preferred media type of the grammar. A resource indicatedby the URI reference in the src attribute may be available in oneor more media types. The author may specify the preferredmedia-type via the type attribute. When the content representedby a URI is available in many data formats, a VoiceXML platformmay use the preferred media-type to influence which of themultiple formats is used. For instance, on a server implementingHTTP content negotiation, the processor may use the preferredmedia-type to order the preferences in the negotiation.

The resource representation delivered by dereferencing the URIreference may be considered in terms of two types. Thedeclared media-type is the asserted value for the resourceand theactual media-type is the true format of itscontent. The actual media-type should be the same as the declaredmedia-type, but this is not always the case (e.g. a misconfiguredHTTP server might return 'text/plain for an'application/srgs+xml' document). A specific URI scheme mayrequire that the resource owner always, sometimes, or neverreturn a media-type. The declared media-type is the valuereturned by the resource owner or, if none is returned, thepreferred media type. There may be no declared media-type if theresource owner does not return a value and no preferred type isspecified. Whenever specified, the declared media-type isauthoritative.

Three special cases may arise. The declared media-type may notbe supported by the processor; in this case, anerror.unsupported.format is thrown by the platform. The declaredmedia-type may be supported but the actual media-type may notmatch; an error.badfetch is thrown by the platform. Finally,there may be no declared media-type; the behavior depends on thespecific URI scheme and the capabilities of the grammarprocessor. For instance, HTTP 1.1 allows document intraspection(see[RFC2616], section7.2.1), the data scheme falls back to a default media type, andlocal file access defines no guidelines. The following tableprovides some informative examples:

	HTTP 1.1 request		Local file access
Media-type returned bythe resource owner	text/plain	application/srgs+xml	<none>	<none>
Preferred media-typeappearing in the grammar	Not applicable; the returned typetakes precedence		application/srgs+xml	<none>
Declared media-type	text/plain	application/srgs+xml	application/srgs+xml	<none>
Behavior if the actualmedia-type is application/srgs+xml	error.badfetch thrown;the declared and actual types do not match	The declared and actual types match;success if application/srgs+xml is supported by the processor;otherwise an error.unsupported.format is thrown		Scheme specific; the processor mightintraspect the document to determine the type.

The tentative media types for the W3C grammar format are"application/srgs+xml" for the XML form and "application/srgs"for ABNF grammars.

weight

Specifies the weight of the grammar.SeeSection 3.1.1.3

fetchhint

SeeSection 6.1. This defaults to thegrammarfetchhint property.

fetchtimeout

SeeSection 6.1. This defaults to the fetchtimeoutproperty.

maxage

SeeSection 6.1. This defaults to the grammarmaxageproperty.

maxstale

SeeSection 6.1. This defaults to thegrammarmaxstale property.

Either an "src" attribute or a inline grammar (but not both)must be specified; otherwise, an error.badfetch event isthrown.

The <grammar> element is also extended in VoiceXML 2.0to allow PCDATA for inline grammar formats besides the XML Formof the W3C Speech Recognition Grammar Specification[SRGS].

When referencing an external grammar, the value of the srcattribute is a URI specifying the location of the grammar with anoptional fragment for the rulename. Section 2.2 of the SpeechRecognition Grammar Specification[SRGS] defines several forms of rule reference.The following are the forms that are permitted on a grammarelement in VoiceXML:

Reference to a named rule in an external grammar:src attribute is an absolute or relative URI reference to agrammar which includes a fragment with a rulename. This form ofrule reference to an external grammar follows the behaviordefined in Section 2.2.2 of[SRGS]. If the URI cannot be fetched or if therulename is not defined in the grammar or is not apublic(activatable) rule of that grammar then an error.badfetch isthrown.
Reference to the root rule of an external grammar:src attribute is an absolute or relative URI reference to agrammar but does not include a fragment identifying a rulename.This form implicitly references the root rule of the grammar asdefined in Section 2.2.2 of[SRGS]. If the URI cannot be fetched or if thegrammar cannot be referenced by its root (see Section 4.7 of[SRGS]) then an error.badfetchis thrown.

The following are the forms of rule reference defined by[SRGS] that arenot supported in VoiceXML 2.0.

Local rule reference: a fragment-only URI is notpermitted. (See definition in Section 2.2.1 of[SRGS]). A fragment-only URI value for the srcattribute causes an error.semantic event.
Reference to special rules: although an inlinegrammar may reference the special rules of SRGS (NULL, VOID,GARBAGE) there is no support for special rule references on thegrammar element itself. (See definitions in Section 2.2.3 of[SRGS]). There is no syntacticsupport for this form so no error can be generated.

3.1.2 DTMF Grammars

The <grammar> element can be used to provide a DTMFgrammar that

specifies a set of key presses that a user may use to performan action or supply information, and
for matching DTMF input, returns a corresponding semanticinterpretation. This may be a simple value (such as a string), aflat set of attribute-value pairs (such as day, month, and year),or a nested object (for a complex request).

VoiceXML platforms are required to support the DTMF grammarXML format defined in Appendix D of the[SRGS] to advance application portability.

A DTMF grammar is distinguished from a speech grammar by themode attribute on the <grammar> element. An "xml:lang"attribute has no effect on DTMF grammar handling. In otherrespects speech and DTMF grammars are handled identicallyincluding the ability to define the grammar inline, or by anexternal grammar reference. The media type handling, scoping andfetching are also identical.

The following is an example of a simple inline XML DTMFgrammar that accepts as input either "1 2 3" or "#".

<grammar mode="dtmf" version="1.0" root="root">  <rule scope="public">    <one-of>      <item> 1 2 3 </item>      <item> # </item>    </one-of>  </rule></grammar>

3.1.3 Scope of Grammars

Input item grammars are always scoped to thecontaining input item; that is, they are active only when thecontaining input item was chosen during the select phase ofthe FIA. Grammars contained in input items cannot specify ascope; if they do, an error.badfetch is thrown.

Link grammars are given the scope of the element thatcontains the link. Thus, if they are defined in the applicationroot document, links are also active in any other loadedapplication document. Grammars contained in links cannot specifya scope; if they do, an error.badfetch is thrown.

Form grammars are by default given dialog scope, sothat they are active only when the user is in the form. If theyare given scope document, they are active whenever the user is inthe document. If they are given scope document and the documentis the application root document, then they are also activewhenever the user is in another loaded document in the sameapplication. A grammar in a form may be given document scopeeither by specifying the scope attribute on the form element orby specifying the scope attribute on the <grammar> element.If both are specified, the grammar assumes the scope specified bythe <grammar> element.

Menu grammars are also by default given dialog scope,and are active only when the user is in the menu. But they can begiven document scope and be active throughout the document, andif their document is the application root document, also beactive in any other loaded document belonging to the application.Grammars contained in menu choices cannot specify a scope; ifthey do, an error.badfetch is thrown.

Sometimes a form may need to have some grammars activethroughout the document, and other grammars that should be activeonly when in the form. One reason for doing this is to minimizegrammar overlap problems. To do this, each individual<grammar> element can be given its own scope if that scopeshould be different than the scope of the <form> elementitself:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form scope="document">   <grammar type="application/srgs">     #ABNF 1.0;    language en-gb;    mode voice;    root $command;    public $command = one | two | three;  </grammar>   <grammar type="application/srgs" scope="dialog">     #ABNF 1.0;    language en-gb;    mode voice;    root $command2;    public $command2 = four | five | six;  </grammar> </form></vxml>

3.1.4 Activation of Grammars

When the interpreter waits for input as a result of visitingan input item, the following grammars are active:

grammars for that input item, including grammarscontained in links in that input item;
grammars for its form, including grammars contained in linksin that form;
grammars contained in links in its document, and grammars formenus and other forms in its document which are given documentscope;
grammars contained in links in its application root document,and grammars for menus and forms in its application root documentwhich are given document scope.
grammars defined by platform default event handlers, such ashelp, exit and cancel.

In the case that an input matches more than one activegrammar, the list above defines the precedence order. If theinput matches more than one active grammar with the sameprecedence, the precedence is determined using document order:the first grammar in document order has highest priority. If nogrammars are active when an input is expected, the platform mustthrow an error.semantic event. The error will be thrown in thecontext of the executing element. Menus behave with regard togrammar activation like their equivalent forms (seeSection 2.2.1).

If the form item is modal (i.e., its modal attribute is set totrue), all grammars except its own are turned off while waitingfor input. If the input matches a grammar in a form or menu otherthan the current form or menu, control passes to the other formor menu. If the match causes control to leave the current form,all current form data is lost.

Grammar activation is not affected by the inputmodes property.For instance, if the inputmodes property restricts input to justvoice, DTMF grammars will still be activated, but cannot bematched.

3.1.5 Semantic Interpretation of Input

The Speech Recognition Grammar Specification defines a tagelement which contains content for semantic interpretation ofspeech and DTMF grammars (see Section 2.6 of[SRGS]).

The Semantic Interpretation for Speech Recognitionspecification[SISR]describes a syntax and semantics for tags and specifies how asemantic interpretation for user input can be computed using thecontent of tags associated with the matched tokens and rules. Thesemantic interpretation may be mapped into VoiceXML as describedinSection 3.1.6.

3.1.6 Mapping Semantic Interpretation Results toVoiceXML forms

The semantic interpretation returned from a Speech RecognitionGrammar Specification[SRGS]grammar must be mapped into one or more VoiceXML ECMAScriptvariables. The process by which this occurs differs slightly forform- and field-level results; these differences will be exploredin the next sections. The format of the semantic interpretation,using either the proposed Natural Language Semantics MarkupLanguage[NLSML] or theECMAScript-like output format of[SISR], has no impact on this discussion. Forthe purposes of this discussion, the actual result returned fromthe recognizer is assumed to have been mapped into anECMAScript-like format which is identical to the representationin application.lastresult$.interpretation as discussed inSection 5.1.5.

It is possible that a grammar will match but not return asemantic interpretation. In this case, the platform will use theraw text string for the utterance as the semantic result.Otherwise, this case is handled exactly as if the semanticinterpretation consisted of a simple value.

Every input item has an associatedslot name which maybe used to extract part of the full semantic interpretation. Theslot name is the value of the 'slot' attribute, if present (onlypossible for <field> elements), or else the value of the'name' attribute (for <field>s without a slot attribute,and for other input items as well). If neither slot nor name ispresent, then the slot name is undefined.

The slot name is used during the Process Phase of theFIA to determine whether or notan input item matches. A match occurs when either the slot nameis the same as a top-level property or a slot name is used toselect a sub-property. A property having an undefined value (i.e.ECMAScript undefined) will not match. Likewise, slot names whichare undefined will never match. Examples are given inSection 3.1.6.3. Note that itis possible for a specific slot value to fill more than one inputitem if the slot names of the input items are the same.

The next sections concern mapping form-level and field-levelresults. There is also a brief discussion of other issues such asthe NL Semantics to ECMAScript mapping, transitioninginformation from ASR results to VoiceXML, and dealing withmismatches between the interpretation result and the VoiceXMLform.

3.1.6.1 Mapping form-level results

Grammars specified at the form-level produce aform-levelresult which may fill multiple input items simultaneously.This may occur anytime, whether in an <initial> element orin an input item, that the user's input matches an activeform-level grammar.

Consider the interpretation result from the sentence "I wouldlike a coca cola and three large pizzas with pepperoni andmushrooms." The semantic interpretation may be copied intoapplication.lastresult$.interpretation as

{     drink: "coke",     pizza: {          number: "3",          size: "large",           topping: [                "pepperoni",                "mushrooms"           ]      }}

The following table illustrates how this result from aform-level grammar would be assigned to various input itemswithin the form. Note that all input items that can be filledin from the interpretation are filled in simultaneously. Theexisting values of matching input item variables will beoverwritten, and these items will be marked for <filled>processing during the FIA's Process Phase as described inSection 2.4 andAppendix C.

Table 31: Form-level GrammarAssignments
VoiceXML field	Assigned ECMAScript value	Explanation
1.<field name="drink"/>--or-- <object name="drink"/>--or-- <record name="drink"/>	"coke"	By default aninput item is assigned the top-level result property whosename matches the input item name.
2.<field name="..." slot="drink"/>	"coke"	If specified for afield, the slot name overrides the field namefor selecting the result property.
3.<field name="pizza"/>--or-- <object name="pizza"/>--or-- <record name="pizza"/>--or-- <field name="..." slot="pizza"/>	{number: "3", size:"large", topping: ["pepperoni", "mushroom"]}	The input item nameor slot may select a property that is a non-scalar ECMAScriptvariable in the same way that a scalar value is selected in theprevious example. However the application must then handleinspecting the components of the object. This does not takeadvantage of the VoiceXML form-filling algorithm, in that missingslots in the result would not be automatically prompted for. Thismay be sufficient in situations where the server is prepared todeal with a structured object. Otherwise, an application mayprefer to use the method described in the next example.
4.<field name="..." slot="pizza.number"/> <field name="..." slot="pizza.size"/>	"3" "large"	The slot may be used toselect a sub-property of the result. This approach distributesthe result among a number of fields.
5.<field name="..." slot="pizza.topping"/>	["pepperoni","mushroom"]	The selected propertymay be a compound object.

The <field ... slot="pizza.foo"> examples above can beexplained by rules that are compatible with and arestraightforward extensions of the VoiceXML 1.0 "name" and "slot"attributes:

The "slot" attribute of a <field> is a (veryrestricted) ECMAScript expression that selects some portion ofthe result to be assigned to the field. In addition to selectingthe top-level result property, the attribute can selectproperties at arbitrary levels of nesting, using a dot-separatedlist of element/property names, as in "pizza.number" and"order.pizza.topping".
If the "slot" attribute of a field is used to select asub-property of the result and that sub-property does not existin the result, then the fielddoes not match the result(seeSection3.1.6).

3.1.6.2 Mapping a field-level result

Grammars specified within an input item produce afield-level result which may fill only the particularinput item in which they are contained. These grammars are activeonly when the FIA is visiting that specific input item. This isuseful, for instance, in directed dialogs where a user isprompted individually for each input item.

A field-level result fills the associated inputitem in the following manner:

If the interpretation is a simple result, this is assigned tothe input item variable.
If the interpretation is a structure and the slot namematches a property, this property's value is assigned to theinput item variable.
Otherwise, the full semantic result is assigned.

This process allows an input item to extract a particularproperty from the semantic interpretation. This may be combinedwith <filled> for achieve even greater control.

<field name="getdate">  <prompt>On what date would you like to fly?</prompt>  <grammar src="http://server.example.com/date.grxml"/>  <!-- this grammar always returns an object containing       string values for the properties day, month, and year -->  <filled>    <assign name="getdate.datestring"             expr="getdate.year + getdate.month + getdate.day"/>  </filled></field>

3.1.6.3 Additional examples

A matching slot name allows an input item to extract part of asemantic interpretation. Consider this modified result from theearlier pizza example.

application.lastresult$.interpretation ={ drink: { size: 'large', liquid: 'coke' },  pizza: { number: '3', size: 'large',           topping: ['pepperoni', 'mushroom' ] },   sidedish: undefined}

The table below revisits the definition of when the slot namematches a property in the result.

Table 32: Slot Name Matching
slot name	match or not?
undefined	does not match
drink	matches; top level property
pizza	matches; top level property
sidedish	does not match; no defined value
size	does not match; not a top-levelproperty
pizza.size	matches; sub-property
pizza.liquid	does not match

It is also possible to compare the behaviors of form-level andfield-level results. For this purpose, consider the followingdocument:

<?xml version="1.0" encoding="UTF-8"?>  <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd">    <form>      <grammar src="formlevel.grxml"/>      <initial> Say something. </initial>      <field name="x">        <grammar src="fieldx.grxml"/>      </field>      <field name="z" slot="y">        <grammar src="fieldz.grxml"/>      </field>    </form>  </vxml>

This defines two input item variables, 'x' and 'z'. Thecorresponding slot names are 'x' and 'y' respectively. The nexttable describes the assignment of these variables depending onwhich grammar is recognized and what semantic result is returned.The shorthandvalueX is used to indicate 'the structuredobject or simple result value associated with the propertyx'.

Table 33: Variable Assignments Dependingon Grammar and Semantic Result
application. lastresult$.interpretation	form-level result (formlevel.grxml)	field-level result in field x (fieldx.grxml)	field-level result in field z (fieldz.grxml)
= 'hello'	no assignment; cycle FIA	x = 'hello'	z = 'hello'
= { x: valueX }	x = valueX	x = valueX	z = { x: valueX }
= { y: valueY }	z = valueY	x = { y: valueY }	z = valueY
= { z: valueZ }	no assignment; cycle FIA	x = { z: valueZ }	z = { z: valueZ }
= { x: valueX, y: valueY, z: valueZ }	x = valueX z = valueY	x = valueX	z = valueY
= { a: valueA b: valueB }	no assignment; cycle FIA	x = { a: valueA, b: valueB }	z = { a: valueA, b: valueB }

At the form level, simple results like the string 'hello'cannot match any input items; structured objects assign all inputitem variables with matching slot names. At the field level,simple results are always assigned to the input item variable;structured objects will extract the matching property, if itexists, or will otherwise be assigned the entire semanticresult.

3.1.6.4 Additional issues

1. Mapping from NL semantics to ECMAScript: If the NLSemantics Markup Language ([NLSML]) is used, a mapping needs to be definedfrom the NLSML representation to ECMAScript objects. Since bothtypes of representation have similar nested structures, thismapping is fairly straightforward. This mapping is discussed indetail in the NL Semantics specification.

2. Transitioning semantic results from ASR to VoiceXML: Theresult of processing the semantic tags of a W3C ASR grammar isthe value of the attribute of the root rule when all semanticattachment evaluations have been completed. In addition, the rootrule (like all non-terminals) has an associated "text" variablewhich contains the series of tokens in the utterance that isgoverned by that non-terminal. In the process of making ASRresults available to VoiceXML documents, the VoiceXML platform isnot only responsible for filling in the VoiceXML fields based onthe value of the attribute of the root rule, as described above,but also for filling in the shadow variables of the field. Thename$.utterance shadow variable of the field should be the sameas the "text" variable value for the ASR root rule. The platformis also responsible for instantiating the value of the shadowvariable "name$.confidence" based on information supplied by theASR platform, as well as the value of "name$.inputmode" based onwhether DTMF or speech was processed. Finally, the platform isresponsible for making this same information available in the"application.lastresult$" variable, defined in Section 5.1.5(specifically, "application.lastresult$.utterance","application.lastresult$.inputmode", and"application.lastresult$.interpretation"), with the exception ofapplication.lastresult$.confidence, which the platform sets tothe confidence of the entire utterance interpretation.

3. Mismatches between semantic results and VoiceXML fields:Mapping semantic results to VoiceXML depends on a tightcoordination between the ASR grammar and the VoiceXML markup.Since in the current framework there's nothing that enforcesconsistency between a grammar and the associated VoiceXML dialog,mismatches can occur due to developer oversight. Since thedialog's behaviour during these mismatches is difficult todistinguish from certain normal situations, verifying consistencyof information is extremely important. Some examples ofmismatches:

The semantic results contain extra information that doesn'tcorrespond to the VoiceXML fields.This could occur either due todeveloper error or if a richer grammar is being used than isrequired by the VoiceXML application. This extra information willbe ignored.
The VoiceXML application expects information in the resultthat isn't present. This could also be due to developer error, orit may be that the user simply didn't supply a value for aparticular slot. In this case the normal FIA applies and themissing value would be elicited from the user. If the problem wasin fact caused by a developer error, and the grammar is actuallyincapable of recognizing the correct value, the FIA will keepeliciting the missing value until it invokes whatever provisionsthe platform and application have in place for repeated nomatchfailures.
Finally, information might be present in both the VoiceXMLand the ASR result, but in inconsistent formats. For example, anASR grammar might provide a structured object for the drink whichincludes the size and whether the drink is diet or non-diet, butthe VoiceXML form might only expect a string consisting of thename of the drink. The system's behaviour in these situationswould depend on what is being done with the results. For example,a structured object might be sent to a server side script that'sexpecting a string, and the consequences of this would depend onthe server script.

In order to address these potential problems, the committee islooking at various approaches to ensuring consistency between thegrammar and the VoiceXML.

4. SystemOutput

4.1Prompts

The <prompt> element controls the output of synthesizedspeech and prerecorded audio. Conceptually, prompts areinstantaneously queued for play, so interpretation proceeds untilthe user needs to provide an input. At this point, the promptsare played, and the system waits for user input. Once the inputis received from the speech recognition subsystem (or the DTMFrecognizer), interpretation proceeds.

The <prompt> element has the following attributes:

Table 34: <prompt>Attributes
bargein	Control whether a user can interrupta prompt. This defaults to the value of the bargein property. SeeSection 6.3.4.
bargeintype	Sets the type of bargein to be'speech', or 'hotword'. This defaults to the value of thebargeintype property. SeeSection 6.3.4.
cond	An expression that must evaluate totrue after conversion to boolean in order for the prompt to beplayed. Default is true.
count	A number that allows you to emitdifferent prompts if the user is doing something repeatedly.If omitted, it defaults to "1".
timeout	The timeout that will be used forthe following user input. The value is a Time Designation (seeSection 6.5). The defaultnoinput timeout is platform specific.
xml:lang	Thelanguage identifier for the prompt. Ifomitted, it defaults to the value specified in the document's"xml:lang" attribute.
xml:base	Declares the base URI from whichrelative URIs in the prompt are resolved. This base declarationhas precedence over the <vxml> base URI declaration. If alocal declaration is omitted, the value is inherited down thedocument hierarchy.

4.1.1 Speech Markup

The content of the <prompt> element is modelled on theW3C Speech Synthesis Markup Language 1.0[SSML].

The following speech markup elements are defined in[SSML] and are available inVoiceXML 2.0. Refer to the W3C Speech Synthesis Markup Language1.0 [SSML] for definitionsand examples.

Table 35: SSML Elements Available inVoiceXML
Element	Purpose	Section (in SSML spec)
<audio>	Specifies audio files to be playedand text to be spoken.	3.3.1
<break>	Specifies a pause in the speechoutput.	3.2.3
<desc>	Provides a description of anon-speech audio source in <audio>.	3.3.3
<emphasis>	Specifies that the enclosed textshould be spoken with emphasis.	3.2.2
<lexicon>	Specifies a pronunciation lexicon forthe prompt.	3.1.4
<mark>	Ignored by VoiceXML platforms.	3.3.2
<meta>	Specifies meta and "http-equiv"properties for the prompt.	3.1.5
<metadata>	Specifies XML metadata content forthe prompt.	3.1.6
<p>	Identifies the enclosed text as aparagraph, containing zero or more sentences	3.1.7
<phoneme>	Specifies a phonetic pronunciationfor the contained text.	3.1.9
<prosody>	Specifies prosodic information forthe enclosed text.	3.2.4
<say-as>	Specifies the type of text constructcontained within the element.	3.1.8
<s>	Identifies the enclosed text as asentence.	3.1.7
<sub>	Specifies replacement spoken text forthe contained text.	3.1.10
<voice>	Specifies voice characteristics forthe spoken text.	3.2.1

When used in VoiceXML, additional properties are defined forthe <audio> (Section4.1.3) and <say-as> (Appendix P) elements. VoiceXML also allows<enumerate> and <value> elements to appear within the<prompt> element.

The VoiceXML platform must be a Conforming Speech SynthesisMarkup Language Processor as defined in the[SSML]. While this requires a platform toprocess documents with one or more "xml:lang" attributes defined,it does not require that the platform must be multi-lingual. Whenan unsupported language is encountered, the platform throws anerror.unsupported.language event which specifies thelanguage in its message variable.

4.1.2 Basic Prompts

You've seen prompts in the previous examples:

<prompt>Please say your city.</prompt>

You can leave out the <prompt> ... </prompt>if:

There is no need to specify a prompt attribute (like bargein),and
The prompt consists entirely of PCDATA (contains no speechmarkups) or consists of just an <audio> or <value>element.

For instance, these are also prompts:

Please say your city.<audio src="say_your_city.wav"/>

But in this example, the enclosing prompt elements arerequired due to the embedded speech markups:

<prompt>Please <emphasis>say</emphasis> your city.</prompt>

When prompt content is specified without an explicit<prompt> element, then the prompt attributes are defined asspecified inTable 34.

4.1.3 Audio Prompting

Prompts can consist of any combination of prerecorded files,audio streams, or synthesized speech:

<prompt>   Welcome to the Bird Seed Emporium.   <audio src="rtsp://www.birdsounds.example.com/thrush.wav"/>   We have 250 kilogram drums of thistle seed for   <say-as interpret-as="currency">$299.95</say-as>   plus shipping and handling this month.   <audio src="http://www.birdsounds.example.com/mourningdove.wav"/></prompt>

Audio can be played in any prompt. The audio content can bespecified via a URI, and in VoiceXML it can also be in anaudio variable previously recorded:

<prompt>   Your recorded greeting is   <audio expr="greeting"/>   To rerecord, press 1.   To keep it, press pound.   To return to the main menu press star M.   To exit press star, star X.</prompt>

The <audio> element can have alternate content in casethe audio sample is not available:

<prompt>   <audio src="welcome.wav">      <emphasis>Welcome</emphasis> to the Voice Portal.    </audio></prompt>

If the audio file cannot be played (e.g. 'src' referencing or'expr' evaluating to an invalid URI, a file with an unsupportedformat, etc), the content of the audio element is played instead.The content may include text, speech markup, or another audioelement. If the audio file cannot be played and the contentof the audio element is empty, no audio is played and no errorevent is thrown.

If <audio> contains an 'expr' attribute evaluating toECMAScript undefined, then the element, including its alternatecontent, is ignored. This allows a developer to specify<audio> elements with dynamically assigned content which,if the element is not required, can be ignored by assigning its'expr' a null value. For example, the following code shows howthis could be used to play back a hand of cards usingconcatenated audio clips:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>  <!-- script contains the function sayCard(type,position)       which takes as input the type of card description (audio or text) and       its position in an array, and returns the selected card description in       the specified array position; if there is no description in the        requested array position, then returns ECMAScript undefined  -->     <script src="cardgame.js"/>  <field name="takecard">       <grammar type="application/srgs+xml" src="/grammars/boolean.grxml"/>       <prompt>           <audio src="you_have.wav">You have the following cards: </audio>           <!-- maximum of hand of 5 cards is described -->           <audio expr="sayCard(audio,1)"><value expr="sayCard(text,1)"/></audio>           <audio expr="sayCard(audio,2)"><value expr="sayCard(text,2)"/></audio>           <audio expr="sayCard(audio,3)"><value expr="sayCard(text,3)"/></audio>           <audio expr="sayCard(audio,4)"><value expr="sayCard(text,4)"/></audio>           <audio expr="sayCard(audio,5)"><value expr="sayCard(text,5)"/></audio>           <audio src="another.wav">Would you like another card?</audio>        </prompt>        <filled>              <if cond="takecard">                  <script>takeAnotherCard()</script>                  <clear/>              <else/>                  <goto next="./make_bid.vxml"/>              </if>        </filled>   </field></form></vxml>

Attributes of <audio> defined in[SSML] are:

Table 36: <audio> AttributesInherited from SSML
src	The URI of the audio prompt. SeeAppendix E for requiredaudio file formats; additional formats may be used if supportedby the platform.

Attributes of <audio> defined only in VoiceXML are:

Table 37: <audio> Attributes addedin VoiceXML
fetchtimeout	SeeSection 6.1. This defaults to the fetchtimeoutproperty.
fetchhint	SeeSection 6.1. This defaults to the audiofetchhintproperty.
maxage	SeeSection 6.1. This defaults to the audiomaxageproperty.
maxstale	SeeSection 6.1. This defaults to the audiomaxstaleproperty.
expr	An ECMAScript expression whichdetermines the source of the audio to be played. The expressionmay be either a reference to audio previously recorded with the<record/> item or evaluate to the URI of an audio resourceto fetch.

Exactly one of "src" or "expr" must be specified; otherwise,an error.badfetch event is thrown.

Note that it is a platform optimization to stream audio: i.e.the platform may begin processing audio content as it arrives andnot to wait for full retrieval. The "prefetch" fetchhint can beused to request full audio retrieval prior to playback.

4.1.4 <value> Element

The <value> element is used to insert the value of anexpression into a prompt. It has one attribute:

Table 38: <value>Attributes
expr	The expression to render.

For example if n is 12, the prompt

<prompt>  <value expr="n*n"/> is the square of <value expr="n"/>.</prompt>

will result in the text string "144 is the square of 12" beingpassed to the speech synthesis engine.

The manner in which the value attribute is played iscontrolled by the surrounding speech synthesis markup. Forinstance, a value can be played as a date in the followingexample:

<var name="date" expr="'2000/1/20'"/><prompt>    <say-as interpret-as="date">        <value expr="date"/>    </say-as></prompt>

The text inserted by the <value> element is not subjectto any special interpretation; in particular, it is not parsed asan[SSML] document ordocument fragment. XML special characters (&, >, and <)are not treated specially and do not need to be escaped. Theequivalent effect may be obtained by literally inserting the textcomputed by the <value> element in a CDATA section. Forexample, when the following variable assignment:

<script>     <![CDATA[         e1 = 'AT&T';    ]]></script>

is referenced in a prompt element as

  <prompt> The price of <value expr="e1"/> is $1. </prompt>

the following output is produced.

 The price of AT&T is $1.

4.1.5 Bargein

If an implementation platform supports bargein, theapplication author can specify whether a user can interrupt, or"bargein" on, a prompt using speech or DTMF input. This speeds upconversations, but is not always desired. If the applicationauthor requires that the user must hear all of a warning, legalnotice, or advertisement, bargein should be disabled. This isdone with the bargein attribute:

<prompt bargein="false"><audio src="legalese.wav"/></prompt>

Users can interrupt a prompt whose bargein attribute is true,but must wait for completion of a prompt whose bargein attributeis false. In the case where several prompts are queued, thebargein attribute of each prompt is honored during the period oftime in which that prompt is playing. If bargein occurs duringany prompt in a sequence, all subsequent prompts are not played(even those whose bargein attribute is set to false). If thebargein attribute is not specified, then the value of thebargein property is used if set.

When the bargein attribute is false, input is notbuffered while the prompt is playing, and any DTMF inputbuffered in a transition state is deleted from the buffer (Section 4.1.8 describes inputcollection during transition states).

Note that not all speech recognition engines or implementationplatforms support bargein. For a platform to support bargein, itmust support at least one of the bargein types described inSection 4.1.5.1.

4.1.5.1 Bargein type

When bargein is enabled, the bargeintype attribute can beused to suggest the type of bargein the platform will performin response to voice or DTMF input. Possible values for thisattribute are:

Table 39: bargeintype Values
speech	The prompt will bestopped as soon as speech or DTMF input is detected.The prompt is stopped irrespective of whether or not theinput matches a grammar and irrespective of which grammarsare active.
hotword	The prompt will not bestopped until a complete match of an active grammar is detected.Input that does not match a grammar is ignored (note that thiseven applies during the timeout period); as a consequence, anomatch event will never be generated in the case of hotwordbargein.

If the bargeintype attribute is not specified, then the valueof the bargeintype property is used. Implementations that claimto support bargein are required to support at least one of thesetwo types. Mixing these types within a single queue of promptscan result in unpredictable behavior and is discouraged.

In the case of "speech" bargeintype, the exact meaning of"speech input" is necessarily implementation-dependent, due tothe complexity of speech recognition technology. It is expectedthat the prompt will be stopped as soon as the platform is ableto reliably determine that the input is speech. Stopping theprompt as early as possible is desireable because it avoids the"stutter" effect in which a user stops in mid-utterance andre-starts if he does not believe that the system has heardhim.

4.1.6 Prompt Selection

Tapered prompts are those that may change with eachattempt. Information-requesting prompts may become more terseunder the assumption that the user is becoming more familiar withthe task. Help messages become more detailed perhaps, under theassumption that the user needs more help. Or, prompts can changejust to make the interaction more interesting.

Each input item, <initial>, and menu has an internalprompt counter that is reset to one each time the form or menu isentered. Whenever the system selects a given input item inthe select phase of FIA and FIA does perform normal selection andqueuing of prompts (i.e., as described inSection 5.3.6, the previous iteration of FIA didnot end with a catch handler that had no reprompt), the inputitem's associated prompt counter is incremented. This is themechanism supporting tapered prompts.

For instance, here is a form with a form level prompt andfield level prompts:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>  <block>    <prompt bargein="false">      Welcome to the ice cream survey.    </prompt>  </block>  <field name="flavor">  <grammar mode="voice" version="1.0" root="root">    <rule scope="public">      <one-of>        <item>vanilla </item>        <item>chocolate</item>        <item>strawberry</item>     </one-of>    </rule>   </grammar>   <prompt count="1">What is your favorite flavor?</prompt>   <prompt count="3">Say chocolate, vanilla, or strawberry.</prompt>   <help>Sorry, no help is available.</help>  </field></form></vxml>

A conversation using this form follows:

C: Welcome to the ice cream survey.
C: What is your favorite flavor?(the"flavor" field's prompt counter is 1)
H: Pecan praline.
C: I do not understand.
C: What is your favorite flavor?(theprompt counter is now 2)
H: Pecan praline.
C: I do not understand.
C: Say chocolate, vanilla, or strawberry.(prompt counter is 3)
H: What if I hate those?
C: I do not understand.
C: Say chocolate, vanilla, or strawberry.(prompt counter is 4)
H: ...

This is just an example to illustrate the use of promptcounters. A polished form would need to offer a more extensiverange of choices and to deal with out of range values in moreflexible way.

When it is time to select a prompt, the prompt counter isexamined. The child prompt with the highest count attribute lessthan or equal to the prompt counter is used. If a prompt has nocount attribute, a count of "1" is assumed.

Aconditional prompt is one that is spoken only if itscondition is satisfied. In this example, a prompt is varied oneach visit to the enclosing form.

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>   <var name="r" expr="Math.random()"/>   <field name="another">       <grammar type="application/srgs+xml" src="/grammars/boolean.grxml"/>       <prompt cond="r &lt; .50">          Would you like to hear another elephant joke?       </prompt>       <prompt cond="r &gt;= .50">         For another joke say yes.  To exit say no.       </prompt>       <filled>          <if cond="another">            <goto next="#pick_joke"/>          </if>       </filled>   </field></form></vxml>

When a prompt must be chosen, a set of prompts to be queued ischosen according to the following algorithm:

Form an ordered list of prompts consisting of all prompts inthe enclosing element in document order.
Remove from this list all prompts whose cond evaluates tofalse after conversion to boolean.
Find the "correct count": the highest count among the promptelements still on the list less than or equal to the currentcount value.
Remove from the list all the elements that don't have the"correct count".

All elements that remain on the list will be queued forplay.

4.1.7 Timeout

The timeout attribute specifies the interval of silenceallowed while waiting for user input after the end of the lastprompt. If this interval is exceeded, the platform will throw anoinput event. This attribute defaults to the value specified bythe timeout property (seeSection 6.3.4) at the time the prompt isqueued. In other words, each prompt has its own timeoutvalue.

The reason for allowing timeouts to be specified as promptattributes is to support tapered timeouts. For example, the usermay be given five seconds for the first input attempt, and tenseconds on the next.

The prompt timeout attribute determines the noinput timeoutfor the following input:

<prompt count="1">  Pick a color for your new Model T.</prompt><prompt count="2" timeout="120s">  Please choose color of your new nineteen twenty four  Ford Model T. Possible colors are black, black, or  black.  Please take your time.</prompt>

If several prompts are queued before a field input, thetimeout of the last prompt is used.

4.1.8 Prompt Queueing and Input Collection

A VoiceXML interpreter is at all times in one of twostates:

waiting for input in an input item (such as<field>, <record>, or <transfer>),or
transitioning between input items in response to aninput (including spoken utterances, dtmf key presses, andinput-related events such as a noinput or nomatch event) receivedwhile in the waiting state. While in the transitioning state nospeech input is collected, accepted or interpreted. Consequentlyroot and document level speech grammars (such as defined in<link>s) may not be active at all times. However, DTMFinput (including timing information) should be collected andbuffered in the transition state. Similarly, asynchronouslygenerated events not related directly to execution of thetransition should also be buffered until the waiting state (e.g.connection.disconnect.hangup).

The waiting and transitioning states are related to the phasesof the Form Interpretation Algorithm as follows:

the waiting state is eventually entered in the collect phaseof an input item (at the point at which the interpreter waits forinput), and
the transitioning state encompasses the process and selectphases, the collect phase for control items (such as<block>s), and the collect phase for input items up untilthe point at which the interpreter waits for input.

This distinction of states is made in order to greatlysimplify the programming model. In particular, an importantconsequence of this model is that the VoiceXML applicationdesigner can rely on all executable content (such as the contentof <filled> and <block> elements) being run tocompletion, because it is executed while in the transitioningstate, which may not be interrupted by input.

While in the transitioning state various prompts are queued,either by the <prompt> element in executable content or bythe <prompt> element in form items. In addition, audio maybe queued by the fetchaudio attribute. The queued prompts andaudio are played either

when the interpreter reaches the waiting state, at whichpoint the prompts are played and the interpreter listens forinput that matches one of the active grammars,or
when the interpreter begins fetching a resource (such as adocument) for which fetchaudio was specified. In this case theprompts queued before the fetchaudio are played to completion,and then, if the resource actually needs to be fetched (i.e. itis not unexpired in the cache), the fetchaudio is played untilthe fetch completes. The interpreter remains in the transitioningstate and no input is accepted during the fetch.

Note that when a prompt's bargein attribute is false, input isnot collected and DTMF input buffered in a transition state isdeleted (seeSection4.1.5).

When an ASR grammar is matched, if DTMF input was consumed bya simultaneously active DTMF grammar (but did not result in acomplete match of the DTMF grammar), the DTMF input may, atprocessor discretion, be discarded.

Before the interpreter exits all queued prompts are played tocompletion. The interpreter remains in the transitioning stateand no input is accepted while the interpreter is exiting.

It is a permissible optimization to begin playing promptsqueued during the transitioning state before reaching the waitingstate, provided that correct semantics are maintained regardingprocessing of the input audio received while the prompts areplaying, for example with respect to bargein and grammarprocessing.

The following examples illustrate the operation of these rulesin some common cases.

Case 1

Typical non-fetching case: field, followed by executablecontent (such as <block> and <filled>), followed byanother field.

 in document d0    <field name="f0"/>    <block>        executable content e1        queues prompts {p1}    </block>    <field name="f2">        queues prompts {p2}        enables grammars {g2}    </field>

As a result of input received while waiting in field f0 thefollowing actions take place:

in transitioning state
- execute e1 (without goto)
- queue prompts {p1}
- queue prompts {p2}
in waiting state, simultaneously
- play prompts {p1,p2}
- enable grammars {g2} and wait for input

Case 2

Typical fetching case: field, followed by executable content(such as <block> and <filled>) ending with a<goto> that specifies fetchaudio, ending up in a field in adifferent document that is fetched from a server.

 in document d0    <field name="f0"/>    <block>        executable content e1        queues prompts {p1}        ends with goto f2 in d1 with fetchaudio fa    </block>in document d1    <field name="f2">        queues prompts {p2}        enables grammars {g2}    </field>

As a result of input received while waiting in field f0 thefollowing actions take place:

in transitioning state
- execute e1
- queue prompts {p1}
- simultaneously
  - fetch d1
  - play prompts {p1} to completion and then play fa until fetchcompletes
- queue prompts {p2}
in waiting state, simultaneously
- play prompts {p2}
- enable grammars {g2} and wait for input

Case 3

As in Case 2, but no fetchaudio is specified.

 in document d0    <field name="f0"/>    <block>        executable content e1        queues prompts {p1}        ends with goto f2 in d1 (no fetchaudio specified)    </block>in document d1    <field name="f2">        queues prompts {p2}        enables grammars {g2}    </field>

As a result of input received while waiting in field f0 thefollowing actions take place:

in transitioning state
- execute e1
- queue prompts {p1}
- fetch d1
- queue prompts {p2}
in waiting state, simultaneously
- play prompts {p1, p2}
- enable grammars {g2} and wait for input

5. Controlflow and scripting

5.1Variables and Expressions

VoiceXML variables are in all respects equivalent toECMAScript variables: they are part of the same variable space.VoiceXML variables can be used in a <script> just asvariables defined in a <script> can be used in VoiceXML.Declaring a variable using <var> is equivalent to using a'var' statement in a <script> element. <script> canalso appear everywhere that <var> can appear. VoiceXMLvariables are also declared by form items.

The variable naming convention is as in ECMAScript, but namesbeginning with the underscore character ("_") and names endingwith a dollar sign ("$") are reserved for internal use. VoiceXMLvariables, including form item variables, must not containECMAScript reserved words. They must also follow ECMAScript rulesfor referential correctness. For example, variable names must beunique and their declaration must not include a dot - "var x.y"is an illegal declaration in ECMAScript. Variable names whichviolate naming conventions or ECMAScript rules cause an'error.semantic' event to be thrown.

5.1.1 Declaring Variables

Variables are declared by <var> elements:

<var name="home_phone"/> <var name="pi" expr="3.14159"/> <var name="city" expr="'Sacramento'"/>

They are also declared by form items:

<field name="num_tickets">    <grammar type="application/srgs+xml" src="/grammars/number.grxml"/>   <prompt>How many tickets do you wish to purchase?</prompt> </field>

Variables declared without an explicit initial value areinitialized to the ECMAScript undefined value. Variables mustbe declared before being used either in VoiceXML or ECMAScript.Use of an undeclared variable results in an ECMAScript errorwhich is thrown as an error.semantic. Variables declared using"var" in ECMAScript can be used in VoiceXML, just as declaredVoiceXML variables can be used in ECMAScript.

In a form, the variables declared by <var> and thosedeclared by form items are initialized when the form is entered.The initializations are guaranteed to take place in documentorder, so that this, for example, is legal:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>    <var name="one" expr="1"/>    <field name="two" expr="one+1">     <grammar type="application/srgs+xml" src="/grammars/number.grxml"/>   </field>    <var name="three" expr="two+1"/>    <field name="go_on" type="boolean">      <prompt>Say yes or no to continue</prompt>    </field> </form></vxml>

When the user visits this <form>, the form'sinitialization first declares the variable one and sets its valueto 1. Then it declares the variable two and gives it the value 2.Then the initialization logic declares the variable three andgives it the value 3. The form interpretation algorithm thenenters its main interpretation loop and begins at the go_onfield.

5.1.2 Variable Scopes

VoiceXML uses an ECMAScript scope chain to allow variables tobe declared at different levels of hierarchy in an application.For instance, a variable declared at document scope can bereferenced anywhere within that document, whereas a localvariable declared in a catch element is only available withinthat catch element. In order to preserve these scoping semantics,all ECMAScript variables must be declared. Use of an undeclaredvariable results in an ECMAScript error which is thrown as anerror.semantic.

Variables can be declared in following scopes:

Table 40: Variable Scopes
session	These are read-only variables thatpertain to an entire user session. They are declared and set bythe interpreter context. New session variables cannot be declaredby VoiceXML documents. SeeSection 5.1.4.
application	These are declared with <var>and <script> elements that are children of theapplication root document's <vxml> element. They areinitialized when the application root document is loaded. Theyexist while the application root document is loaded, and arevisible to the root document and any other loaded applicationleaf document. Note that while executing inside theapplication root document document.x is equivalent toapplication.x.
document	These variables are declared with<var> and <script> elements that are children ofthe document's <vxml> element. They are initialized whenthe document is loaded. They exist while the document is loaded.They are visible only within that document, unless the documentis an application root, in which case the variables are visibleby leaf documents through the application scope only.
dialog	Each dialog (<form> or<menu>) has a dialog scope that exists while the user isvisiting that dialog, and which is visible to the elements ofthat dialog. Dialog scope contains the following variables:variables declared by <var> and <script> childelements of <form>, form item variables, and form itemshadow variables. The child <var> and <script>elements of <form> are initialized when the form is firstvisited, as opposed to <var> elements inside executablecontent which are initialized when the executable content isexecuted.
(anonymous)	Each <block>, <filled>,and <catch> element defines a new anonymous scope tocontain variables declared in that element.

The following diagram shows the scope hierarchy:

flow from anonymous via dialog, document, application and session
Figure 11: The scope hierarchy.

The curved arrows in this diagram show that each scopecontains a pre-defined variable whose name is the same as thescope that refers to the scope itself. This allows you forexample in the anonymous, dialog, and document scopes to refer toa variableX in the document scope usingdocument.X. As another example, a <filled>'svariable scope is an anonymous scope local to the <filled>,whose parent variable scope is that of the <form>.

It is not recommended to use "session", "application","document", and "dialog" as the names of variables and formitems. While they are not reserved words, using them hides thepre-defined variables with the same name because of ECMAScriptscoping rules used by VoiceXML.

5.1.3 Referencing Variables

Variables are referenced in cond and expr attributes:

<if cond="city == 'LA'">    <assign name="city" expr="'Los Angeles'"/> <elseif cond="city == 'Philly'"/>    <assign name="city" expr="'Philadelphia'"/> <elseif cond="city =='Constantinople'"/>    <assign name="city" expr="'Istanbul'"/> </if> <assign name="var1" expr="var1 + 1"/> <if cond="i &gt; 1">    <assign name="i" expr="i-1"/> </if>

The expression language used in cond and expr is preciselyECMAScript. Note that the cond operators "<", "<=", and"&&" must be escaped in XML (to "<" and soon).

Variable references match the closest enclosing scopeaccording to the scope chain given above. You can prefix areference with a scope name for clarity or to resolve ambiguity.For instance to save the value of a variable associated with oneof the fields in a form for use later on in adocument:

<assign name="document.ssn" expr="dialog.ssn"/>

If the application root document has a variable x, it isreferred to as application.x in non-root documents, and eitherapplication.x or document.x in the application root document.If the document does not have a specified application rootand has a variable x, it is referred to as either application.xor document.x in the document.

5.1.4 Standard Session Variables

session.connection.local.uri: This variable is a URI which addresses the local interpretercontext device.
session.connection.remote.uri: This variable is a URI which addresses the remote callerdevice.
session.connection.protocol.name: This variable is the name of the connection protocol. Thename also represents the subobject name for protocol specificinformation. For instance, if session.connection.protocol.name is'q931', session.connection.protocol.q931.uui might specify theuser-to-user information property of the connection.
session.connection.protocol.version: This variable is the version of the connection protocol.
session.connection.redirect: This variable is an array representing the connectionredirection paths. The first element is the original callednumber, the last element is the last redirected number. Eachelement of the array contains a uri, pi (presentationinformation), si (screening information), and reason property.The reason property can be either "unknown", "user busy", "noreply", "deflection during alerting", "deflection immediateresponse", "mobile subscriber not reachable".
session.connection.aai: This variable is application-to-application informationpassed during connection setup.
session.connection.originator: This variable directly references either the local or remoteproperty (For instance, the following ECMAScript would returntrue if the remote party initiated the connection: varcaller_initiate = connection.originator ==connection.remote).

5.1.5 Standard Application Variables

application.lastresult$

This variable holds information about the last recognition tooccur within this application. It is an array of elements whereeach element, application.lastresult$[i], represents a possibleresult through the following variables:

application.lastresult$[i].confidence: The whole utterance confidence level for this interpretationfrom 0.0-1.0. A value of 0.0 indicates minimum confidence, and avalue of 1.0 indicates maximum confidence. More specificinterpretation of a confidence value is platform-dependent.
application.lastresult$[i].utterance: The raw string of words that were recognized for thisinterpretation. The exact tokenization and spelling isplatform-specific (e.g. "five hundred thirty" or "5 hundred 30"or even "530"). In the case of a DTMF grammar, this variable willcontain the matched digit string.
application.lastresult$[i].inputmode: For this interpretation,the mode in which user input wasprovided: dtmf or voice.
application.lastresult$[i].interpretation: An ECMAScript variable containing the interpretation asdescribed inSection3.1.5.

Interpretations are sorted by confidence score, from highestto lowest. Interpretations with the same confidence score arefurther sorted according to the precedence relationship (seeSection 3.1.4) among thegrammars producing the interpretations. Different elements inapplication.lastresult$ will always differ in their utterance,interpretation, or both.

The number of application.lastresult$ elements is guaranteedto be greater than or equal to one and less than or equal to thesystem property "maxnbest". If no results have been generated bythe system, then "application.lastresult$" shall be ECMAScriptundefined.

Additionally, application.lastresult$ itself contains theproperties confidence, utterance, inputmode, and interpretationcorresponding to those of the 0th element in the ECMAScriptarray.

All of the shadow variables described above are setimmediately after any recognition. In this context, a<nomatch> event counts as a recognition, and causes thevalue of "application.lastresult$" to be set, though thevalues stored in application.lastresult$ are platform dependent.In addition, the existing values of field variables are notaffected by a <nomatch>. In contrast, a<noinput> event does not change the value of"application.lastresult$". After the value of"application.lastresult$" is set, the value persists(unless it is modified by the application) until thebrowser enters the next waiting state, when it is set toundefined. Similarly, when an application root documentis loaded, this variable is set to the valueundefined.The variable application.lastresult$ and all of itscomponents are writeable and can be modified by theapplication.

The following example shows how application.lastresult$ can beused in a field level <catch> to access a <link>grammar recognition result and transition to different dialogstates depending on confidence:

<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><link event="menulinkevent">    <grammar src="/grammars/linkgrammar.grxml" type="application/srgs+xml"/></link><form>    <field>        <prompt> Say something </prompt>         <catch event="menulinkevent">           <if cond="application.lastresult$.confidence &lt; 0.7">              <goto nextitem="confirmlinkdialog"/>           <else/>              <goto next="./main_menu.html"/>           </if>        </catch>    </field></form></vxml>

The final example demonstrates how a script can be used toiterate over the array of results in application.lastresult$,where each element is represented by"application.lastresult$[i]":

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"> <form>    <field name="color">        <prompt> Say a color </prompt>    <grammar type="application/srgs+xml" src="color.grxml" />         <filled>        <var name="confident_count" expr="0"/>            <script>            <![CDATA[         // number of results          var len = application.lastresult$.length;         // iterate through array         for (var i = 0; i < len; i++) {           // check if DTMF            if (application.lastresult$[i].confidence > .7) {             confident_count++;           }         }        ]]>        </script>        <if cond="confident_count > 1">           <goto next="#verify"/>        </if>       </filled>    </field></form></vxml>

5.2Event Handling

The platform throws events when the user does not respond,doesn't respond in a way that the application understands,requests help, etc. The interpreter throws events if it findsa semantic error in a VoiceXML document, or when it encountersa <throw> element. Events are identified by characterstrings.

Each element in which an event can occur has a set ofcatchelements, which include:

<catch>
<error>
<help>
<noinput>
<nomatch>

An element inherits the catch elements ("as if by copy") fromeach of its ancestor elements, as needed. If a field, forexample, does not contain a catch element for nomatch, but itsform does, the form's nomatch catch element is used. Inthis way, common event handling behavior can be specified at anylevel, and it applies to all descendents.

The "as if by copy" semantics for inheriting catch elementsimplies that when a catch element is executed, variables areresolved and thrown events are handled relative to the scopewhere the original event originated, not relative to the scopethat contains the catch element. For example, consider a catchelement that is defined at document scope handling an event thatoriginated in a <field> within the document. In such acatch element variable references are resolved relative to the<field>'s scope, and if an event is thrown by the catchelement it is handled relative to the <field>. Similarly,relative URI references in a catch element are resolved againstthe active document and not relative to the document in whichthey were declared. Finally, properties are resolved relative tothe element where the event originated. For example, a promptelement defined as part of a document level catch would use theinnermost property value of the active form item to resolve itstimeout attribute if no value is explicitly specified.

5.2.1 throw element

The <throw> element throws an event. These can be thepre-defined ones:

<throw event="nomatch"/> <throw event="connection.disconnect.hangup"/>

or application-defined events:

<throw event="com.att.portal.machine"/>

Attributes of <throw> are:

Table 41: <throw>Attributes
event	The event being thrown.
eventexpr	An ECMAScript expression evaluatingto the name of the event being thrown.
message	A message string providing additionalcontext about the event being thrown. For the pre-defined eventsthrown by the platform, the value of the message isplatform-dependent. The message is available as the value of a variable within thescope of the catch element, see below.
messageexpr	An ECMAScript expression evaluatingto the message string.

Exactly one of "event" or "eventexpr" must be specified;otherwise, an error.badfetch event is thrown. Exactly one of"message" or "messageexpr" may be specified; otherwise, anerror.badfetch event is thrown.

Unless explicited stated otherwise, VoiceXML does not specifywhen events are thrown.

5.2.2 catch element

The catch element associates a catch with a document, dialog,or form item (except for blocks). It contains executablecontent.

<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>  <field name="user_id" type="digits">     <prompt>What is your username</prompt>   </field>  <field name="password">     <prompt>What is the code word?</prompt>     <grammar version="1.0" root="root">     <rule scope="public">rutabaga</rule>    </grammar>   <help>It is the name of an obscure vegetable.</help>    <catch event="nomatch noinput" count="3">      <prompt>Security violation!</prompt>      <submit next="http://www.example.com/apprehend_felon.vxml"                 namelist="user_id"/>    </catch>   </field> </form></vxml>

The catch element's anonymous variable scope includes thespecial variable _event which contains the name of the event thatwas thrown. For example, the following catch element can handletwo types of events:

<catch event="event.foo event.bar">  <if cond="_event=='event.foo'">    <!-- Play this for event.foo events -->    <audio src="foo.wav"/>  <else/>    <!-- Play this for event.bar events -->    <audio src="bar.wav"/>  </if>  <!-- Continue with common handling for either event --></catch>

The _event variable is inspected to select the audio to playbased on the event that was thrown. The foo.wav file will beplayed for event.foo events. The bar.wav file will be played forevent.bar events. The remainder of the catch element containsexecutable content that is common to the handling of both eventtypes.

The catch element's anonymous variable scope also includes thespecial variable _message which contains the value of the messagestring from the corresponding <throw> element, or aplatform-dependent value for the pre-defined events raised by theplatform. If the thrown event does not specify a message, thevalue of _message is ECMAScript undefined.

If a <catch> element contains a <throw> elementwith the same event, then there may be an infinite loop:

<catch event="help">    <throw event="help"/> </catch>

A platform could detect this situation and throw a semanticerror instead.

Attributes of <catch> are:

Table 42: <catch>Attributes
event	The event or events to catch. Aspace-separated list of events may be specified, indicating thatthis <catch> element catches all the events named in thelist. In such a case a separate event counter (see "count"attribute) is maintained for each event. If the attribute isunspecified, all events are to be caught.
count	The occurrence of the event (defaultis 1). The count allows you to handle different occurrences ofthe same event differently. Each <form>, <menu>, and form item maintains acounter for each event that occurs while it is being visited.Item-level event counters are used for events thrown whilevisiting individual form items and while executing <filled>elements contained within those items. Form-level and menu-levelcounters are used for events thrown during dialog initializationand while executing form-level <filled> elements. Form-level and menu-level event counters are reset each timethe <menu> or <form> is re-entered. Form-level andmenu-level event counters are not reset by the <clear>element. Item-level event counters are reset each time the <form>containing the item is re-entered. Item-level event counters arealso reset when the item is reset with the <clear> element.An item's event counters are not reset when the item isre-entered without leaving the <form>. Counters are incremented against the full event name and everyprefix matching event name; for example, occurrence of the event"event.foo.1" increments the counters for "event.foo.1" plus"event.foo" and "event".
cond	An expression which must evaluate totrue after conversion to boolean in order for the event to becaught. Defaults to true.

5.2.3 Shorthand Notation

The <error>, <help>, <noinput>, and<nomatch> elements are shorthands for very common types of<catch> elements.

The <error> element is short for <catchevent="error"> and catches all events of type error:

<error>  An error has occurred -- please call again later.  <exit/></error>

The <help> element is an abbreviation for <catchevent="help">:

<help>No help is available.</help>

The <noinput> element abbreviates <catchevent="noinput">:

<noinput>I didn't hear anything, please try again.</noinput>

And the <nomatch> element is short for <catchevent="nomatch">:

<nomatch>I heard something, but it wasn't a known city.</nomatch>

These elements take the attributes:

Table 43: Shorthand CatchAttributes
count	The event count (as in<catch>).
cond	An optional condition to test to seeif the event is caught by this element (as in <catch>described inSection 5.2.2).Defaults to true.

5.2.4 catch element selection

An element inherits the catch elements ("as if by copy") fromeach of its ancestor elements, as needed. For example, if a<field> element inherits a <catch> element from thedocument

<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><catch event="event.foo">    <audio src="beep.wav"/></catch><form>    <field name="color">        <prompt>Please say a primary color</prompt>    <grammar type="application/srgs">red | yellow | blue</grammar>        <nomatch>            <throw event="event.foo"/>        </nomatch>    </field></form></vxml>

then the <catch> element is implicitly copied into<field> as if defined below:

<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>    <field>        <prompt>Please say a primary color</prompt>    <grammar type="application/srgs">red | yellow | blue</grammar>        <nomatch>            <throw event="event.foo"/>        </nomatch>        <catch event="event.foo">            <audio src="beep.wav"/>        </catch>    </field></form></vxml>

When an event is thrown, the scope in which the event ishandled and its enclosing scopes are examined to find thebestqualified catch element, according to the followingalgorithm:

Form an ordered list of catches consisting of all catches inthe current scope and all enclosing scopes (form item, form,document, application root document, interpreter context),ordered first by scope (starting with the current scope), andthen within each scope by document order.
Remove from this list all catches whose event name does notmatch the event being thrown or whose cond evaluates to falseafter conversion to boolean.
Find the "correct count": the highest count among the catchelements still on the list less than or equal to the currentcount value.
Select the first element in the list with the "correctcount".

The name of a thrown event matches the catch element eventname if it is an exact match, a prefix match or if the catchevent attribute is not specified (note that the event attributecannot be specified as an empty string - event="" is syntacticallyinvalid). A prefix match occurs when the catch element eventattribute is a token prefix of the name of the event being thrown,where the dot is the token separator, all trailing dots areremoved, and a remaining empty string matches everything. Forexample,

<catch event="connection.disconnect">   <prompt>Caught a connection dot disconnect event</prompt></catch>

will prefix match the eventconnection.disconnect.transfer.

<catch event="com.example.myevent">   <prompt>Caught a com dot example dot my event</prompt></catch>

prefix matches com.example.myevent.event1.,com.example.myevent. and com.example.myevent..event1 but notcom.example.myevents.event1. Finally,

<catch event=".">   <prompt>Caught an event</prompt></catch>

prefix matches all events (as does <catch> without anevent attribute).

Note that the catch element selection algorithm gives priorityto catch elements that occur earlier in a document over thosethat occur later, but does not give priority to catch elementsthat are more specific over those that are less specific.Therefore is generally advisable to specify catch elements inorder from more specific to less specific. For example, it wouldbe advisable to specify catch elements for "error.foo" and"error" in that order, as follows:

<catch event="error.foo">   <prompt>Caught an error dot foo event</prompt></catch> <catch event="error">   <prompt>Caught an error event</prompt> </catch>

If the catch elements were specified in the opposite order,the catch element for "error.foo" would never be executed.

5.2.5 Default catch elements

The interpreter is expected to provide implicit default catchhandlers for the noinput, help, nomatch, cancel, exit, and errorevents if the author did not specify them.

The system default behavior of catch handlers for variousevents and errors is summarized by the definitions below thatspecify (1) whether any audio response is to be provided, and (2)how execution is affected. Note: where an audio response isprovided, the actual content is platform dependent.

Table 44: Default CatchHandlers
Event Type	Audio Provided	Action
cancel	no	don't reprompt
error	yes	exit interpreter
exit	no	exit interpreter
help	yes	reprompt
noinput	no	reprompt
nomatch	yes	reprompt
maxspeechtimeout	yes	reprompt
connection.disconnect	no	exit interpreter
all others	yes	exit interpreter

Specific platforms will differ in the default promptspresented.

5.2.6 Event Types

There are pre-defined events, and application andplatform-specific events. Events are also subdivided into plainevents (things that happen normally), and error events (abnormaloccurrences). The error naming convention allows for multiplelevels of granularity.

A conforming browser may throw an event that extends apre-defined event string so long as the event contains thespecified pre-defined event string as a dot-separated exactinitial substring of its event name. Applications that writecatch handlers for the pre-defined events will be interoperable.Applications that write catch handlers for extended event namesare not guaranteed interoperability. For example, if in loading agrammar file a syntax error is detected the platform must throw"error.badfetch". Throwing "error.badfetch.grammar.syntax" is anacceptable implementation.

Components of event names in italics are to be substitutedwith the relevant information; for example, inerror.unsupported.element,element is substitutedwith the name of VoiceXML element which is not supported such aserror.unsupported.transfer. All other event name components arefixed.

Further information about an event may be specified in the"_message" variable (seeSection5.2.2).

The pre-defined events are:

cancel: The user has requested to cancel playing of the currentprompt.
connection.disconnect.hangup: The user has hung up.
connection.disconnect.transfer: The user has been transferred unconditionally to another lineand will not return.
exit: The user has asked to exit.
help: The user has asked for help.
noinput: The user has not responded within the timeout interval.
nomatch: The user input something, but it was not recognized.
maxspeechtimeout: The user input was too long exceeding the 'maxspeechtimeout'property.

In addition to transfer errors (Section 2.3.7.3), the pre-defined errorsare:

error.badfetch: The interpreter context throws this event when a fetch of adocument has failedand the interpreter context hasreached a place in the document interpretation where the fetchresult is required. Fetch failures result from unsupported schemereferences, malformed URIs, client aborts, communication errors,timeouts, security violations, unsupported resource types,resource type mismatches, document parse errors, and a variety oferrors represented by scheme-specific error codes.; If the interpreter context has speculatively prefetched adocument and that document turns out not to be needed,error.badfetch is not thrown. Likewise if the fetch of an<audio> document fails and if there is a nested alternate<audio> document whose fetch then succeeds, or if there isnested alternate text, no error.badfetch occurs.; When an interpreter context is transitioning to a newdocument, the interpreter context throws error.badfetch on anerror until the interpreter is capable of executing the newdocument, but again only at the point in time where the newdocument is actually needed, not before. Whether or not variableinitialization is considered part of executing the new documentis platform-dependent.
error.badfetch.http.response_codeerror.badfetch.protocol.response_code: In the case of a fetch failure, the interpreter context mustuse a detailed event type telling which specific HTTP or otherprotocol-specific response code was encountered. The value of theresponse code for HTTP is defined in[RFC2616]. This allows applications todifferentially treat a missing document from a prohibiteddocument, for instance. The value of the response code for otherprotocols (such as HTTPS, RTSP, and so on) is dependent upon theprotocol.
error.semantic: A run-time error was found in the VoiceXML document, e.g.substring bounds error, or an undefined variable wasreferenced.
error.noauthorization: Thrown when the application tries to perform anoperation that is not authorized by the platform. Examples wouldinclude dialing an invalid telephone number or one which the useris not allowed to call, attempting to access a protected databasevia a platform-specific <object>, inappropriate access tobuiltin grammars, etc.
error.noresource: A run-time error occurred because a requested platformresource was not available during execution.
error.unsupported.builtin: The platform does not support a requested builtintype/grammar.
error.unsupported.format: The requested resource has a format that is not supported bythe platform, e.g. an unsupported grammar format, or mediatype.
error.unsupported.language: The platform does not support the language for either speechsynthesis or speech recognition.
error.unsupported.objectname: The platform does not support a particular platform-specificobject. Note that 'objectname' is a fixed string and is notsubstituted with the name of the unsupported object.
error.unsupported.element: The platform does not support the givenelement,whereelement is a VoiceXML element defined in thisspecification. For instance, if a platform does not implement<transfer>, it must throw error.unsupported.transfer.This allows an author to use event handling to adapt todifferent platform capabilities.

Errors encountered during document loading, includingtransport errors (no document found, HTTP status code 404, and soon) and syntactic errors (no <vxml> element, etc) result ina badfetch error event raised in the calling document. Errorsthat occur after loading and before entering the initializationphase of the Form Interpretation Algorithm are handled in aplatform-specific manner. Errors that occur after entering theFIA initialization phase, such as semantic errors, are raised inthe new document. The handling of errors encountered during theloading of the first document in a session isplatform-specific.

Application-specific and platform-specific event types shoulduse the reversed Internet domain name convention to avoid namingconflicts. For example:

error.com.example.voiceplatform.noauth: The user is not authorized to dial out on this platform.
org.example.voice.someapplication.toomanynoinputs: The user is far too quiet.

Catches can catch specific events (cancel) or all thosesharing a prefix (error.unsupported).

5.3Executable Content

Executable content refers to a block of procedurallogic. Such logic appears in:

The <block> form item.
The <filled> actions in forms and input items.
Event handlers (<catch>, <help>, et cetera).

Executable elements are executed in document order in theirblock of procedural logic. If an executable element generates anerror, that error is thrown immediately. Subsequent executableelements in that block of procedural logic are not executed.

This section covers the elements that can occur in executablecontent.

5.3.1 var element

This element declares a variable. It can occur in executablecontent or as a child of <form> or <vxml>.Examples:

<var name="phone" expr="'6305551212'"/> <var name="y" expr="document.z+1"/>

If it occurs in executable content, it declares a variable inthe anonymous scope associated with the enclosing <block>,<filled>, or catch element. This declaration is made onlywhen the <var> element is executed. If the variable isalready declared in this scope, subsequent declarations act asassignments, as in ECMAScript.

If a <var> is a child of a <form> element, itdeclares a variable in the dialog scope of the <form>. Thisdeclaration is made during the form's initialization phaseas described inSection2.1.6.1. The <var> element is not a form item, and sois not visited by the Form Interpretation Algorithm's mainloop.

If a <var> is a child of a <vxml> element, itdeclares a variable in the document scope; and if it is thechild of a <vxml> element in a root document then it alsodeclares the variable in the application scope. This declarationis made when the document is initialized; initializations happenin document order.

Attributes of <var> include:

Table 45: <var>Attributes
name	The name of the variable that willhold the result. Unlike the name attribute of <assign>element (Section 5.3.2),this attribute must not specify a variable with a scope prefix(if a variable is specified with a scope prefix, then anerror.semantic event is thrown). The scope in which the variableis defined is determined from the position in the document atwhich the element is declared.
expr	The initial value of the variable(optional). If there is no expr attribute, the variable retainsits current value, if any. Variables start out with theECMAScript value undefined if they are not given initialvalues.

5.3.2 assign element

The <assign> element assigns a value to a variable:

<assign name="flavor" expr="'chocolate'"/> <assign name="document.mycost" expr="document.mycost+14"/>

It is illegal to make an assignment to a variable that has notbeen explicitly declared using a <var> element or a varstatement within a <script>. Attempting to assign to anundeclared variable causes an error.semantic event to bethrown.

Note that when an ECMAScript object, e.g. "obj", has beenproperly initialized then its properties, for instance"obj.prop1", can be assigned without explicit declaration (infact, an attempt to declare ECMAScript object properties such as"obj.prop1" would result in an error.semantic event beingthrown).

Attributes include:

Table 46: <assign>Attributes
name	The name of the variable beingassigned to. As specified inSection 5.1.2, the corresponding variablemust have been previously declared otherwise an error.semanticevent is thrown. By default, the scope in which the variable isresolved is the closest enclosing scope of the currently activeelement. To remove ambiguity, the variable name may be prefixedwith a scope name as described inSection 5.1.3.
expr	The new value of the variable.

5.3.3 clear element

The <clear> element resets one or more variables,including form items. For each specified variable name, thevariable is resolved relative to the current scope according toSection 5.1.3 (to removeambiguity, each variable name in the namelist may be prefixedwith a scope name). Once a declared variable has been identified,its value is assigned the ECMAScript undefined value. Inaddition, if the variable name corresponds to a form item, thenthe form item's prompt counter and event counters arereset.

For example:

<clear namelist="city state zip"/>

The attribute is:

Table 47: <clear>Attributes
namelist	The list of variables to be reset;this can include variable names other than form items. If anundeclared variable is referenced in the namelist, then anerror.semantic is thrown (Section 5.1.1). When not specified, allform items in the current form are cleared.

5.3.4 if, elseif, else elements

The <if> element is used for conditional logic. It hasoptional <else> and <elseif> elements.

<if cond="total > 1000">   <prompt>This is way too much to spend.</prompt>   <throw event="com.xyzcorp.acct.toomuchspent"/> </if> <if cond="amount &lt; 29.95">   <assign name="x" expr="amount"/> <else/>   <assign name="x" expr="29.95"/> </if> <if cond="flavor == 'vanilla'">   <assign name="flavor_code" expr="'v'"/> <elseif cond="flavor == 'chocolate'"/>   <assign name="flavor_code" expr="'h'"/> <elseif cond="flavor == 'strawberry'"/>   <assign name="flavor_code" expr="'b'"/> <else/>   <assign name="flavor_code" expr="'?'"/> </if>

5.3.5 prompts

Prompts can appear in executable content, in their fullgenerality, except that the <prompt> count attribute ismeaningless. In particular, the cond attribute can be used inexecutable content. Prompts may be wrapped with <prompt>and </prompt>, or represented using PCDATA. Wherever<prompt> is allowed, the PCDATAxyz is interpretedexactly as if it had appeared as<prompt>xyz</prompt>.

<nomatch count="1">   To open the pod bay door, say your code phrase clearly. </nomatch> <nomatch count="2">   <prompt>    This is your <emphasis>last</emphasis> chance.  </prompt> </nomatch> <nomatch count="3">   Entrance denied.   <exit/> </nomatch>

5.3.6 reprompt element

The FIA expects a catch element to queue appropriate promptsin the course of handling an event. Therefore, the FIA does notgenerally perform the normal selection and queuing of prompts onthe next iteration following the execution of a catch element.However, the FIA does perform normal selection and queueing ofprompts after the execution of a catch element (<catch>,<error>, <help>, <noinput>, <nomatch>) intwo cases:

if the catch element ends by executing a <goto> or<submit> to another dialog, or if it ends with a<return> from a subdialog; in this case the new dialogneeds to be guaranteed that its initial prompt remains intact andcannot be suppressed or replaced by a referring dialog; or
if a <reprompt> is executed in the catch to requestthat the subsequent prompts be played.

In these two cases, after the FIA selects the next form itemto visit, it performs normal prompt processing, includingselecting and queuing the form item's prompts and incrementingthe form item's prompt counter.

For example, this noinput catch expects the next form itemprompt to be selected and played:

<field name="want_ice_cream">   <grammar type="application/srgs+xml" src="/grammars/boolean.grxml"/>  <prompt>Do you want ice cream for dessert?</prompt>   <prompt count="2">     If you want ice cream, say yes.     If you do not want ice cream, say no.   </prompt>   <noinput>     I could not hear you.     <!-- Cause the next prompt to be selected and played. -->     <reprompt/>  </noinput> </field>

A quiet user would hear:

C: Do you want ice cream for dessert?
H:(silence)
C: I could not hear you.
C: If you want ice cream, say yes. If you don't want icecream, say no.
H:(silence)
C: I could not hear you.
C: If you want ice cream, say yes. If you don't want icecream, say no.
H: No

If there were no <reprompt>, the user would insteadhear:

C: Do you want ice cream for dessert?
H:(silence)
C: I could not hear you.
H:(silence)
C: I could not hear you.
H: No

Note that a consequence of skipping the prompt selection phaseas described above is that the prompt counter of the form itemselected by the FIA after the execution of a catch element (thatdoes not execute a <reprompt>, or leave the dialog via<goto>, <submit> or <return>) will not beincremented.

Also note that the prompt selection phase following theexecution of a catch element (that does not execute a<reprompt> or leave the dialog via <goto>,<submit> or <return>) is skipped even if the formitem selected by the FIA is different from the previous formitem.

The <reprompt> element has no effect outside of acatch.

5.3.7 goto element

The <goto> element is used to:

transition to another form item in the current form,
transition to another dialog in the current document, or
transition to another document.

To transition to another form item, use the nextitemattribute, or the expritem attribute if the form item name iscomputed using an ECMAScript expression:

<goto nextitem="ssn_confirm"/> <goto expritem="(type==12)? 'ssn_confirm' : 'reject'"/>

To go to another dialog in the same document, use next (orexpr) with only a URI fragment:

<goto next="#another_dialog"/> <goto expr="'#' + 'another_dialog'"/>

To transition to another document, use next (or expr) with aURI:

<goto next="http://flight.example.com/reserve_seat"/> <goto next="./special_lunch#wants_vegan"/>

The URI may be absolute or relative to the current document.You may specify the starting dialog in the next document using afragment that corresponds to the value of the id attribute of adialog. If no fragment is specified, the first dialog in thatdocument is chosen.

Note that transitioning to another dialog in the currentdocument causes the old dialog's variables to be lost, evenin the case where a dialog is transitioning to itself.Transitioning to another document using an absolute or relativeURI will likewise drop the old document level variables, even ifthe new document is the same one that is making the transition.However, document variables are retained when transitioning to anempty URI reference with a fragment identifier. For example, thefollowing statements behave differently in a document with theURI http://someco.example.com/index.vxml:

<goto next="#foo"/><goto next="http://someco.example.com/index.vxml#foo"/>

According to[RFC2396], the fragment identifier (the partafter the '#') is not part of a URI and transitioning to emptyURI references plus fragment identifiers should never result in anew document fetch. Therefore "#foo" in the first statement is anempty URI reference with a fragment identifier and documentvariables are retained. In the second statement "#foo" is part ofan absolute URI and the document variables are lost. If you wantdata to persist across multiple documents, store data in theapplication scope.

The dialog to transition to is specified by the URI referencein the <goto>'s next or expr attribute (see[RFC2396]). If this URIreference contains an absolute or relative URI, which mayinclude a query string, then that URI is fetched and the dialogis found in the resulting document.

If the URI reference contains only a fragment (i.e., noabsolute or relative URI), then there is no fetch: the dialog isfound in the current document.

The URI reference's fragment, if any, names the dialog totransition to. When there is no fragment, the dialog chosen isthe lexically first dialog in the document.

If the form item, dialog or document to transition to is notvalid (i.e. the form item, dialog or document does not exist),an error.badfetch must be thrown. Note that for errors which occurduring a dialog or document transition, the scope in which errorsare handled is platform specific. For errors which occurduring form item transition, the event is handled in the dialogscope.

Attributes of <goto> are:

Table 48: <goto>Attributes
next	The URI to which to transition.
expr	An ECMAScript expression that yieldsthe URI.
nextitem	The name of the next form item tovisit in the current form.
expritem	An ECMAScript expression that yieldsthe name of the next form item to visit.
fetchaudio	SeeSection 6.1. This defaults to the fetchaudioproperty.
fetchhint	SeeSection 6.1. This defaults to thedocumentfetchhint property.
fetchtimeout	SeeSection 6.1. This defaults to the fetchtimeoutproperty.
maxage	SeeSection 6.1. This defaults to the documentmaxageproperty.
maxstale	SeeSection 6.1. This defaults to thedocumentmaxstale property.

Exactly one of "next", "expr", "nextitem" or "expritem" mustbe specified; otherwise, an error.badfetch event is thrown.

5.3.8 submit element

The <submit> element is used to submit information tothe origin Web server and then transition to the document sentback in the response. Unlike <goto>, it lets you submit alist of variables to the document server via an HTTP GET or POSTrequest. For example, to submit a set of form items to the serveryou might have:

<submit next="log_request" method="post" namelist="name rank serial_number"  fetchtimeout="100s" fetchaudio="audio/brahms2.wav"/>

The dialog to transition to is specified by the URI referencein the <submit>'s next or expr attribute (see[RFC2396], Section 4.2). TheURI is always fetched even if it contains just a fragment. In thecase of a fragment, the URI requested is the base URI of thecurrent document. This means that the following two elements havesubstantially different effects:

<goto next="#get_pin"/><submit next="#get_pin"/>

Note that although the URI is always fetched and the resultingdocument is transitioned to, some <submit> requests can besatisfied by intermediate caches. This might happen, for example,if the origin Web server provides an explicit expiration time withthe response.

If the dialog or document to transition to is not valid (i.e.the dialog or document does not exist), an error.badfetch must bethrown. Note that for errors which occur during a dialog ordocument transition, the scope in which errors are handled isplatform specific.

Attributes of <submit> include:

Table 49: <submit>Attributes
next	The URI reference.
expr	Like next, except that the URIreference is dynamically determined by evaluating the givenECMAScript expression.
namelist	The list of variables to submit. Bydefault, all the named input item variables are submitted. If anamelist is supplied, it may contain individual variablereferences which are submitted with the same qualification usedin the namelist. Declared VoiceXML and ECMAScript variables canbe referenced. If an undeclared variable is referenced in thenamelist, then an error.semantic is thrown (Section 5.1.1).
method	The request method: get (the default)or post.
enctype	The media encoding type of thesubmitted document (when the value of method is "post"). Thedefault is application/x-www-form-urlencoded. Interpreters mustalso support multipart/form-data and may support additionalencoding types.
fetchaudio	SeeSection 6.1. This defaults to the fetchaudioproperty.
fetchhint	SeeSection 6.1. This defaults to thedocumentfetchhint property.
fetchtimeout	SeeSection 6.1. This defaults to the fetchtimeoutproperty.
maxage	SeeSection 6.1. This defaults to the documentmaxageproperty.
maxstale	SeeSection 6.1. This defaults to thedocumentmaxstale property.

Exactly one of "next" or "expr" must be specified; otherwise,an error.badfetch event is thrown.

When an ECMAScript variable is submitted to the server itsvalue is first converted into a string before being submitted. Ifthe variable is an ECMAScript Object the mechanism by which it issubmitted is not currently defined. The mechanism of ECMAScriptObject submission is reserved for future definition. Instead ofsubmitting ECMAScript Objects directly, the application developermay explicitly submit properties of Object as in "date.monthdate.year".

If a <submit> contains a variable which referencesrecorded audio but does not contain an ENCTYPE ofmultipart/form-data, the behavior is not specified. It isprobably inappropriate to attempt to URL-encode large quantitiesof data.

5.3.9 exit element

Returns control to the interpreter context which determineswhat to do next.

<exit/>

This element differs from <return> in that it terminatesall loaded documents, while <return> returns from a<subdialog> invocation. If the <subdialog> caused anew document (or application) to be invoked, then <return>will cause that document to be terminated, but execution willresume after the <subdialog>.

Note that once <exit> returns control to the interpretercontext, the interpreter context is free to do as it wishes. Itmay play a top level menu for the user, drop the call, ortransfer the user to an operator, for example.

Attributes include:

Table 50: <exit>Attributes
expr	An ECMAScript expression that isevaluated as the return value (e.g. "0", "'oops!'", or"field1").
namelist	Variable names to be returned tointerpreter context. The default is to return no variables; thismeans the interpreter context will receive an empty ECMAScriptobject. If an undeclared variable is referenced in the namelist,then an error.semantic is thrown (Section 5.1.1).

Exactly one of "expr" or "namelist" may be specified;otherwise, an error.badfetch event is thrown.

The <exit> element does not throw an "exit" event.

5.3.10 return element

Return ends execution of a subdialog and returns control anddata to a calling dialog.

The attributes are:

Table 51: <return>Attributes
event	Return, then throw this event.
eventexpr	Return, then throw the event to whichthis ECMAScript expression evaluates.
message	A message string providing additionalcontext about the event being thrown. The message is available asthe value of a variable within the scope of the catch element,seeSection 5.2.2.
messageexpr	An ECMAScript expression evaluatingto the message string.
namelist	Variable names to be returned tocalling dialog. The default is to return no variables; this meansthe caller will receive an empty ECMAScript object. If anundeclared variable is referenced in the namelist, then anerror.semantic is thrown (Section 5.1.1).

Exactly one of "event", "eventexpr" or "namelist" may bespecified; otherwise, an error.badfetch event is thrown. Exactlyone of "message" or "messageexpr" may be specified; otherwise, anerror.badfetch event is thrown.

In returning from a subdialog, an event can be thrown at theinvocation point, or data is returned as an ECMAScript objectwith properties corresponding to the variable specified in itsnamelist. A return element that is encountered when not executingas a subdialog throws a semantic error.

The example below shows an event propagated from a subdialogto its calling dialog when the subdialog fails to obtain arecognizable result. It also shows data returned under normalconditions.

<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>   <subdialog name="result" src="#getssn">     <nomatch>       <!-- a no match event that is returned by the          subdialog indicates that a valid Social Securityy          number could not be matched. -->       <goto next="http://myservice.example.com/ssn-problems.vxml"/>     </nomatch>     <filled>       <submit namelist="result.ssn"        next="http://myservice.example.com/cgi-bin/process"/>     </filled>   </subdialog> </form><form>   <field name="ssn">       <grammar src="http://grammarlib/ssn.grxml"         type="application/srgs+xml"/>      <prompt> Please say Social Securityy number.</prompt>       <nomatch count="3">        <return event="nomatch"/>       </nomatch>       <filled>         <return namelist="ssn"/>       </filled>   </field> </form></vxml>

The subdialog event handler for <nomatch> is triggeredon the third failure to match; when triggered, it returns fromthe subdialog, and includes the nomatch event to be thrown in thecontext of the calling dialog. In this case, the calling dialogwill execute its <nomatch> handler, rather than the<filled> element, where the resulting action is to executea <goto> element. Under normal conditions, the<filled> element of the subdialog is executed after arecognized Social Securityy number is obtained, and then thisvalue is returned to the calling dialog, and is accessible asresult.ssn.

5.3.11 disconnect element

Causes the interpreter context to disconnect from the user. Asa result, the interpreter context will throw aconnection.disconnect.hangup event and enter the final processingstate (as described inSection 1.5.4). Processing the<disconnect> element will also flush the prompt queue (asdescribed inSection4.1.8).

5.3.12 script element

The <script> element allows the specification of a blockof client-side scripting language code, and is analogous to the[HTML] <SCRIPT>element. For example, this document has a script that computes afactorial.

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd">     <script> <![CDATA[        function factorial(n)        {         return (n <= 1)? 1 : n * factorial(n-1);       }     ]]> </script>   <form>     <field name="fact">       <grammar type="application/srgs+xml" src="/grammars/number.grxml"/>      <prompt>        Tell me a number and I'll tell you its factorial.      </prompt>       <filled>         <prompt>           <value expr="fact"/> factorial is           <value expr="factorial(fact)"/>         </prompt>       </filled>     </field>   </form> </vxml>

A <script> element may occur in the <vxml> and<form> elements, or in executable content (in<filled>, <if>, <block>, <catch>, or theshort forms of <catch>). Scripts in the <vxml>element are evaluated just after the document is loaded, alongwith the <var> elements, in document order. Scripts in the<form> element are evaluated in document order, along with<var> elements and form item variables, each time executionmoves into the <form> element. A <script> element inexecutable content is executed, like other executable elements,as it is encountered.

The <script> element has the following attributes:

Table 52: <script>Attributes
src	The URI specifying the location ofthe script, if it is external.
charset	The character encoding of the scriptdesignated by src. UTF-8 and UTF-16 encodings of ISO/IEC 10646must be supported (as in[XML]) and other encodings, as defined in the[IANA], may be supported. Thedefault value is UTF-8.
fetchhint	SeeSection 6.1. This defaults to thescriptfetchhint property.
fetchtimeout	SeeSection 6.1. This defaults to the fetchtimeoutproperty.
maxage	SeeSection 6.1. This defaults to the scriptmaxageproperty.
maxstale	SeeSection 6.1. This defaults to the scriptmaxstaleproperty.

Either an "src" attribute or an inline script (but not both)must be specified; otherwise, an error.badfetch event isthrown.

The VoiceXML <script> element (unlike the[HTML] <SCRIPT> element)does not have a type attribute; ECMAScript is the requiredscripting language for VoiceXML.

Each <script> element is executed in the scope of itscontaining element; i.e., it does not have its own scope. Thismeans for example that variables declared with var in the<script> element are declared in the scope of thecontaining element of the <script> element. (In ECMAScriptterminology, the "variable object" becomes the current scope ofthe containing element of the <script> element).

Here is a time-telling service with a block containing ascript that initializes time variables in the dialog scope of aform:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd">   <form>     <var name="hours"/>     <var name="minutes"/>     <var name="seconds"/>     <block>       <script>         var d = new Date();         hours = d.getHours();         minutes = d.getMinutes();         seconds = d.getSeconds();       </script>     </block>     <field name="hear_another">       <grammar type="application/srgs+xml" src="/grammars/boolean.grxml"/>      <prompt>         The time is <value expr="hours"/> hours,         <value expr="minutes"/> minutes, and         <value expr="seconds"/> seconds.       </prompt>       <prompt>Do you want to hear another time?</prompt>       <filled>         <if cond="hear_another">           <clear/>         </if>       </filled>     </field>   </form> </vxml>

The content of a <script> element is evaluated in thesame scope as a <var> element (see5.1.2 Variable Scopes and5.3.1 VAR).

The ECMAScript scope chain (see section 10.1.4 in[ECMASCRIPT]) is set upso that variables declared either with <var> or inside<script> are put into the scope associated with the elementin which the <var> or <script> element occurs. Forexample, the variable declared in a <script> element undera <form> element has a dialog scope, and can be accessed asa dialog scope variable as follows:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd">   <form>     <script>       var now = new Date(); <!-- this has a dialog scope-->    </script>    <var name="seconds" expr="now.getSeconds()"/> <!-- this has a dialog scope-->    <block>       <var name="now" expr="new Date()"/> <!-- this has an anonymous scope -->      <script>         var current = now.getSeconds();       <!-- "now" in the anonymous scope -->        var approx = dialog.now.getSeconds(); <!-- "now" in the dialog scope -->      </script>     </block>   </form> </vxml>

All variables must be declared before being referenced byECMAScript scripts, or by VoiceXML elements as described inSection 5.1.1.

5.3.13 log element

The <log> element allows an application to generate alogging or debug message which a developer can use to help inapplication development or post-execution analysis of applicationperformance.

The <log> element may contain any combination of text(CDATA) and <value> elements. The generated messageconsists of the concatenation of the text and the string form ofthe value of the "expr" attribute of the <value>elements.

The manner in which the message is displayed or logged isplatform-dependent. The usage of label is platform-dependent.Platforms are not required to preserve white space.

ECMAScript expressions in <log> must be evaluated indocument order. The use of the <log> element should have noother side-effects on interpretation.

<log>The card number was <value expr="card_num"/></log>

The <log> element has the following attributes:

Table 53: <log>Attributes
label	An optional string whichmay be used, for example, to indicate the purpose of thelog.
expr	An optional ECMAScript expressionevaluating to a string.

6.Environment and Resources

6.1Resource Fetching

6.1.1 Fetching

A VoiceXML interpreter context needs to fetch VoiceXMLdocuments, and other resources, such as audio files, grammars,scripts, and objects. Each fetch of the content associated with aURI is governed by the following attributes:

Table 54: Fetch Attributes
fetchtimeout	The interval to wait for the contentto be returned before throwing an error.badfetch event. Thevalue is a Time Designation (seeSection 6.5). If not specified, a valuederived from the innermost fetchtimeout property is used.
fetchhint	Defines when the interpreter contextshould retrieve content from the server. prefetch indicates afile may be downloaded when the page is loaded, whereas safeindicates a file that should only be downloaded when actuallyneeded. If not specified, a value derived from the innermostrelevant fetchhint property is used.
maxage	Indicates that the document iswilling to use content whose age is no greater than the specifiedtime in seconds (cf. 'max-age' in HTTP 1.1[RFC2616]). The document isnot willing to use stale content, unless maxstale is also provided.If not specified, a value derived from the innermost relevantmaxage property, if present, is used.
maxstale	Indicates that the document iswilling to use content that has exceeded its expiration time(cf. 'max-stale' in HTTP 1.1[RFC2616]). If maxstale is assigned a value,then the document is willing to accept content that hasexceeded its expiration time by no more than the specified numberof seconds. If not specified, a value derived from the innermostrelevant maxstale property, if present, is used.

When content is fetched from a URI, the fetchtimeout attributedetermines how long to wait for the content (starting from thetime when the resource is needed), and the fetchhint attributedetermines when the content is fetched. The caching policy for aVoiceXML interpreter context utilizes the maxage and maxstaleattributes and is explained in more detail below.

The fetchhint attribute, in combination with the variousfetchhint properties, is merely a hint to the interpreter contextabout when it may schedule the fetch of a resource. Tellingthe interpreter context that it may prefetch a resource does notrequire that the resource be prefetched; it only suggests thatthe resourcemay be prefetched. However, the interpretercontext is always required to honor the safe fetchhint.

When transitioning from one dialog to another, through eithera <subdialog>, <goto>, <submit>, <link>,or <choice> element, there are additional rules that affectinterpreter behavior. If the referenced URI names a document(e.g. "doc#dialog"), or if query data is provided (through POSTor GET), then a new document is obtained (either from a localcache, intermediate cache, or from a origin Web server). When itis obtained, the document goes through its initialization phase(i.e., obtaining and initializing a new application rootdocument if needed, initializing document variables, andexecuting document scripts). The requested dialog (or firstdialog if none is specified) is then initialized and executionof the dialog begins.

Generally, if a URI reference contains only a fragment (e.g.,"#my_dialog"), then no document is fetched, and no initializationof that document is performed. However, <submit> alwaysresults in a fetch, and if a fragment is accompanied by anamelist attribute there will also be a fetch.

Another exception is when a URI reference in a leaf documentreferences the application root document. In this case, the rootdocument is transitioned to without fetching and withoutinitialization even if the URI reference contains an absolute orrelative URI (seeSection1.5.2 and[RFC2396]).However, if the URI reference to the root document contains aquery string or a namelist attribute, the root document isfetched.

Elements that fetch VoiceXML documents also support thefollowing additional attribute:

Table 55: Additional FetchAttribute
fetchaudio	The URI of the audio clipto play while the fetch is being done. If not specified, thefetchaudio property is used, and if that property is not set, noaudio is played during the fetch. The fetching of the audio clipis governed by the audiofetchhint, audiomaxage, audiomaxstale,and fetchtimeout properties in effect at the time of the fetch.The playing of the audio clip is governed by the fetchaudiodelay,and fetchaudiominimum properties in effect at the time of thefetch.

The fetchaudio attribute is useful for enhancing a userexperience when there may be noticeable delays while the nextdocument is retrieved. This can be used to play background music,or a series of announcements. When the document is retrieved, theaudio file is interrupted if it is still playing. If an erroroccurs retrieving fetchaudio from its URI, no badfetch event isthrown and no audio is played during the fetch.

6.1.2 Caching

The VoiceXML interpreter context, like[HTML] visual browsers, can use caching toimprove performance in fetching documents and other resources;audio recordings (which can be quite large) are as common toVoiceXML documents as images are to HTML pages. In a visualbrowser it is common to include end user controls to update orrefresh content that is perceived to be stale. This is not thecase for the VoiceXML interpreter context, since it lacksequivalent end user controls. Thus enforcement of cache refreshis at the discretion of the document through appropriate use ofthe maxage, and maxstale attributes.

The caching policy used by the VoiceXML interpreter contextmust adhere to the cache correctness rules of HTTP 1.1 ([RFC2616]). In particular,the Expires and Cache-Control headers must be honored. Thefollowing algorithm summarizes these rules and represents theinterpreter context behavior when requesting a resource:

If the resource is not present in the cache, fetch it fromthe server using get.
If the resource is in the cache,
- If a maxage value is provided,
  - If age of the cached resource <= maxage,
    - If the resource has expired,
      - Perform maxstale check.
    - Otherwise, use the cached copy.
  - Otherwise, fetch it from the server using get.
- Otherwise,
  - If the resource has expired,
    - Perform maxstale check.
  - Otherwise, use the cached copy.

The "maxstale check" is:

If maxstale is provided,
- If cached copy has exceeded its expiration time by no morethan maxstale seconds, then use the cached copy.
- Otherwise, fetch it from the server using get.
Otherwise, fetch it from the server using get.

Note: it is an optimization to perform a "get if modified" ona document still present in the cache when the policy requires afetch from the server.

The maxage and maxstale properties are allowed to have nodefault value whatsoever. If the value is not provided by thedocument author, and the platform does not provide a defaultvalue, then the value is undefined and the 'Otherwise' clause ofthe algorithm applies. All other properties must provide adefault value (either as given by the specification or by theplatform).

While the maxage and maxstale attributes are drawn from anddirectly supported by HTTP 1.1, some resources may be addressedby URIs that name protocols other than HTTP. If the protocol doesnot support the notion of resource age, the interpreter contextshall compute a resource's age from the time it was received. Ifthe protocol does not support the notion of resource staleness,the interpreter context shall consider the resource to haveexpired immediately upon receipt.

6.1.2.1 Controlling the Caching Policy

VoiceXML allows the author to override the default cachingbehavior for each use of each resource (except for any documentreferenced by the <vxml> element's application attribute:there is no markup mechanism to control the caching policy foran application root document).

Each resource-related element may specify maxage and maxstaleattributes. Setting maxage to a non-zero value can be used to geta fresh copy of a resource that may not have yet expired in thecache. A fresh copy can be unconditionally requested by settingmaxage to zero.

Using maxstale enables the author to state that an expiredcopy of a resource, that is not too stale (according to the rulesof HTTP 1.1), may be used. This can improve performance byeliminating a fetch that would otherwise be required to get afresh copy. It is especially useful for authors who may not havedirect server-side control of the expiration dates of largestatic files.

6.1.3 Prefetching

Prefetching is an optional feature that an interpreter contextmay implement to obtain a resource before it is needed. Aresource that may be prefetched is identified by an element whosefetchhint attribute equals "prefetch". When an interpretercontext does prefetch a resource, it must ensure that theresource fetched is precisely the one needed. In particular, ifthe URI is computed with an expr attribute, the interpretercontext must not move the fetch up before any assignments to theexpression's variables. Likewise, the fetch for a <submit>must not be moved prior to any assignments of the namelistvariables.

The expiration status of a resource must be checked on eachuse of the resource, and, if its fetchhint attribute is"prefetch", then it is prefetched. The check must follow thecaching policy specified in Section 6.1.2.

6.1.4 Protocols

The "http" URI scheme must be supported by VoiceXMLplatforms, the "https" protocol should be supported and other URIprotocols may be supported.

6.2Metadata Information

Metadata information is information about the document ratherthan the document's content. VoiceXML 2.0 provides two elementsin which metadata information can be expressed: <meta> and<metadata>. The <metadata> element provides moregeneral and powerful treatment of metadata information than<meta>.

VoiceXML does not specify required metadata information.However, it does recommend that metadata is expressed usingthe <metadata> element with information in ResourceDescription Framework (RDF)[RDF-SYNTAX] using the Dublin Core version 1.0RDF schema[DC] (seeSection 6.2.2).

6.2.1 meta element

The <meta> element specifies meta information as in[HTML]. There are two types of<meta>.

The first type specifies a metadata property of the documentas a whole and is expressed by the pair of attributes, nameand content. For example to specify the maintainer of aVoiceXML document:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd">   <meta name="maintainer" content="jpdoe@anycompany.example.com"/>  <form>    <block>         <prompt>Hello</prompt>     </block>  </form>  </vxml>

The second type of <meta> specifies HTTP responseheaders and is expressed by the pair of attributeshttp-equiv and content. In the following example, thefirst <meta> element sets an expiration date that preventscaching of the document; the second <meta> element sets theDate header.

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd">   <meta http-equiv="Expires" content="0"/>   <meta http-equiv="Date"    content="Thu, 12 Dec 2000 23:27:21 GMT"/>   <form>    <block>         <prompt>Hello</prompt>     </block>  </form>  </vxml>

Attributes of <meta> are:

Table 56: <meta>Attributes
name	The name of the metadataproperty.
content	The value of the metadataproperty.
http-equiv	The name of an HTTPresponse header.

Exactly one of "name" or "http-equiv" must be specified;otherwise, an error.badfetch event is thrown.

6.2.2 metadata element

The <metadata> element is container in which informationabout the document can be placed using a metadata schema.Although any metadata schema can be used with <metadata>,it is recommended that the RDF schema is used in conjunction withmetadata properties defined in the Dublin Core MetadataInitiative.

RDF is a declarative language and provides a standard way forusing XML to represent metadata in the form of statements aboutproperties and relationships of items on the Web. Contentcreators should refer to W3C metadata Recommendations[RDF-SYNTAX] and[RDF-SCHEMA] as well asthe Dublin Core Metadata Initiative[DC], which is a set of generally applicablecore metadata properties (e.g., Title, Creator, Subject,Description, Copyrights, etc.).

The following Dublin Core metadata properties are recommendedin <metadata>:

Table 57: Recommended Dublin CoreMetadata Properties
Creator	An entity primarilyresponsible for making the content of the resource.
Rights	Information about rightsheld in and over the resource.
Subject	The topic of the contentof the resource. Typically, a subject will be expressed askeywords, key phrases or classification codes. Recommended bestpractice is to select values from a controlled vocabulary orformal classification scheme.

Here is an example of how <metadata> can be included ina VoiceXML document using the Dublin Core version 1.0 RDF schema[DC]:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"> <metadata>   <rdf:RDF       xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"       xmlns:rdfs = "http://www.w3.org/TR/1999/PR-rdf-schema-19990303#"       xmlns:dc = "http://purl.org/metadata/dublin_core#"><!-- Metadata about the VoiceXML document -->   <rdf:Description about="http://www.example.com/meta.vxml"       dc:Title="Directory Enquiry Service"       dc:Description="Directory Enquiry Service for London in VoiceXML"       dc:Publisher="W3C"       dc:Language="en"       dc:Date="2002-02-12"       dc:Rights="Copyright 2002 John Smith"       dc:Format="application/voicexml+xml" >                       <dc:Creator>          <rdf:Seq>             <rdf:li>Jackie Crystal</rdf:li>             <rdf:li>William Lee</rdf:li>          </rdf:Seq>       </dc:Creator>   </rdf:Description>  </rdf:RDF> </metadata> <form>    <block>         <prompt>Hello</prompt>     </block> </form>  </vxml>

6.3property element

The <property> element sets a property value. Propertiesare used to set values that affect platform behavior, such as therecognition process, timeouts, caching policy, etc.

Properties may be defined for the whole application, for thewhole document at the <vxml> level, for a particular dialogat the <form> or <menu> level, or for a particularform item. Properties apply to their parent element and all thedescendants of the parent. A property at a lower level overridesa property at a higher level. When different values for a propertyare specified at the same level, the last one in document orderapplies. Properties specified in the application root documentprovide default values for properties in every document in theapplication; properties specified in an individual documentoverride property values specified in the application rootdocument.

If a platform detects that the value of a property is invalid,then it should throw an error.semantic.

In some cases, <property> elements specify defaultvalues for element attributes, such as timeout or bargein. Forexample, to turn off bargein by default for all the prompts in aparticular form:

<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"> <form>   <property name="bargein" value="false"/>   <block>     <prompt>      This introductory prompt cannot be barged into.    </prompt>     <prompt>      And neither can this prompt.    </prompt>     <prompt bargein="true">      But this one <emphasis>can</emphasis> be barged into.    </prompt>   </block>   <field type="boolean">    <prompt>      Please say yes or no.    </prompt>  </field>  </form></vxml>

The <property> element has the following attributes:

Table 58: <property>Attributes
name	The name of the property.
value	The value of the property.

6.3.1 Platform-Specific Properties

An interpreter context is free to provide platform-specificproperties. For example, to set the "multiplication factor"for this platform in the scope of this document:

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>  <property name="com.example.multiplication_factor" value="42"/>   <block>     <prompt> Welcome </prompt>   </block></form>  </vxml>

By definition, platform-specific properties introduceincompatibilities which reduce application portability.To minimize them, the following interpreter contextguidelines are strongly recommended:

Platform-specific properties should use reverse domain namesto eliminate potential collisions as in: com.example.foo,which is clearly different from net.example.foo

An interpreter context mustnot throw anerror.unsupported.property event when encountering a property itcannot process; rather the interpreter context mustjust ignore that property.

6.3.2 Generic Speech Recognizer Properties

The generic speech recognizer properties mostly are taken fromthe Java Speech API[JSAPI]:

Table 59: Generic Speech RecognizerProperties
confidencelevel	The speech recognition confidencelevel, a float value in the range of 0.0 to 1.0. Results arerejected (a nomatch event is thrown) whenapplication.lastresult$.confidence is below this threshold.A value of 0.0 means minimum confidence is needed for arecognition, and a value of 1.0 requires maximum confidence.The value is a Real Number Designation (seeSection 6.5). The defaultvalue is 0.5.
sensitivity	Set the sensitivity level. A value of1.0 means that it is highly sensitive to quiet input. A value of0.0 means it is least sensitive to noise. The value is aReal Number Designation (seeSection 6.5). The default value is0.5.
speedvsaccuracy	A hint specifying the desired balancebetween speed vs. accuracy. A value of 0.0 means fastestrecognition. A value of 1.0 means best accuracy. The valueis a Real Number Designation (seeSection 6.5). The default is value0.5.
completetimeout	The length of silence required following user speech beforethe speech recognizer finalizes a result (either accepting it orthrowing a nomatch event). The complete timeout is used when thespeech is a complete match of an active grammar. By contrast, theincomplete timeout is used when the speech is an incomplete matchto an active grammar. A long complete timeout value delays the result completion andtherefore makes the computer's response slow. A short completetimeout may lead to an utterance being broken up inappropriately.Reasonable complete timeout values are typically in the range of0.3 seconds to 1.0 seconds. The value is a Time Designation(seeSection 6.5). Thedefault is platform-dependent. SeeAppendix D. Although platforms must parse the completetimeout property,platforms are not required to support the behavior ofcompletetimeout. Platforms choosing not to support the behaviorof completetimeout must so document and adjust the behavior oftheincompletetimeout property as described below.
incompletetimeout	The required length of silence following user speech afterwhich a recognizer finalizes a result. The incomplete timeoutapplies when the speech prior to the silence is an incompletematch of all active grammars. In this case, once thetimeout is triggered, the partial result is rejected (with anomatch event). The incomplete timeout also applies when the speech prior tothe silence is a complete match of an active grammar, but whereit is possible to speak further and still match the grammar. Bycontrast, the complete timeout is used when the speech is acomplete match to an active grammar and no further words can bespoken. A long incomplete timeout value delays the result completionand therefore makes the computer's response slow. A shortincomplete timeout may lead to an utterance being broken upinappropriately. The incomplete timeout is usually longer than the completetimeout to allow users to pause mid-utterance (for example, tobreathe). SeeAppendixD. Platforms choosing not to support thecompletetimeoutproperty (described above) must use the maximum of thecompletetimeout and incompletetimeout values as the value for theincompletetimeout. The value is a Time Designation (seeSection 6.5).
maxspeechtimeout	The maximum duration of user speech. If this time elapsedbefore the user stops speaking, the event "maxspeechtimeout" isthrown. The value is a Time Designation (seeSection 6.5). The defaultduration is platform-dependent.

6.3.3 Generic DTMF Recognizer Properties

Several generic properties pertain to DTMF grammarrecognition:

Table 60: Generic DTMF RecognizerProperties
interdigittimeout	The inter-digit timeoutvalue to use when recognizing DTMF input. The value is a TimeDesignation (seeSection6.5). The default is platform-dependent. SeeAppendix D.
termtimeout	The terminating timeoutto use when recognizing DTMF input. The value is a TimeDesignation (seeSection6.5). The default value is "0s".Appendix D.
termchar	The terminating DTMFcharacter for DTMF input recognition. The default value is "#".SeeAppendix D.

6.3.4 Prompt and Collect Properties

These properties apply to the fundamental platform prompt andcollect cycle:

Table 61: Prompt and CollectProperties
bargein	The bargein attribute touse for prompts. Setting this to true allows bargein by default.Setting it to false disallows bargein. The default value is"true".
bargeintype	Sets the type of bargeinto be speech or hotword. Default is platform-specific. SeeSection 4.1.5.1.
timeout	The time after which anoinput event is thrown by the platform. The value is aTime Designation (seeSection6.5). The default value is platform-dependent. SeeAppendix D.

6.3.5 Fetching Properties

These properties pertain to the fetching of new documents andresources (note that maxage and maxstale properties may have nodefault value - seeSection 6.1.2):

Table 62: Fetching Properties
audiofetchhint	This tells the platformwhether or not it can attempt to optimize dialog interpretationby pre-fetching audio. The value is either safe to say that audiois only fetched when it is needed, never before; or prefetch topermit, but not require the platform to pre-fetch the audio. Thedefault value is prefetch.
audiomaxage	Tells the platform themaximum acceptable age, in seconds, of cached audio resources.The default is platform-specific.
audiomaxstale	Tells the platform themaximum acceptable staleness, in seconds, of expired cached audioresources. The default is platform-specific.
documentfetchhint	Tells the platformwhether or not documents may be pre-fetched. The value is eithersafe (the default), or prefetch.
documentmaxage	Tells the platform themaximum acceptable age, in seconds, of cached documents. Thedefault is platform-specific.
documentmaxstale	Tells the platform themaximum acceptable staleness, in seconds, of expired cacheddocuments. The default is platform-specific.
grammarfetchhint	Tells the platformwhether or not grammars may be pre-fetched. The value is eitherprefetch (the default), or safe.
grammarmaxage	Tells the platform themaximum acceptable age, in seconds, of cached grammars. Thedefault is platform-specific.
grammarmaxstale	Tells the platform themaximum acceptable staleness, in seconds, of expired cachedgrammars. The default is platform-specific.
objectfetchhint	Tells the platformwhether the URI contents for <object> may be pre-fetched ornot. The values are prefetch (the default), or safe.
objectmaxage	Tells the platform themaximum acceptable age, in seconds, of cached objects. Thedefault is platform-specific.
objectmaxstale	Tells the platform themaximum acceptable staleness, in seconds, of expired cachedobjects. The default is platform-specific.
scriptfetchhint	Tells whether scripts maybe pre-fetched or not. The values are prefetch (the default), orsafe.
scriptmaxage	Tells the platform themaximum acceptable age, in seconds, of cached scripts. Thedefault is platform-specific.
scriptmaxstale	Tells the platform themaximum acceptable staleness, in seconds, of expired cachedscripts. The default is platform-specific.
fetchaudio	The URI of the audio toplay while waiting for a document to be fetched. The default isnot to play any audio during fetch delays. There are nofetchaudio properties for audio, grammars, objects, and scripts.The fetching of the audio clip is governed by the audiofetchhint,audiomaxage, audiomaxstale, and fetchtimeout properties in effectat the time of the fetch. The playing of the audio clip isgoverned by the fetchaudiodelay, and fetchaudiominimum propertiesin effect at the time of the fetch.
fetchaudiodelay	The time interval to wait at the start of a fetch delay beforeplaying the fetchaudio source. The value is a TimeDesignation (seeSection6.5). The default interval is platform-dependent, e.g."2s". The idea is that when a fetch delay is short, it maybe better to have a few seconds of silence instead of a bit offetchaudio that is immediately cut off.
fetchaudiominimum	The minimum time interval to play a fetchaudio source, oncestarted, even if the fetch result arrives in the meantime.The value is a Time Designation (seeSection 6.5). The default isplatform-dependent, e.g., "5s". The idea is that once theuser does begin to hear fetchaudio, it should not be stopped tooquickly.
fetchtimeout	The timeout for fetches.The value is a Time Designation (seeSection 6.5). The default value isplatform-dependent.

6.3.6 Miscellaneous Properties

Table 63: MiscellaneousProperties
inputmodes	This property determineswhich input modality to use. The input modes to enable: dtmf andvoice. On platforms that support both modes, inputmodes defaultsto "dtmf voice". To disable speech recognition, set inputmodes to"dtmf". To disable DTMF, set it to "voice". One use for thiswould be to turn off speech recognition in noisy environments.Another would be to conserve speech recognition resources byturning them off where the input is always expected to be DTMF.This property does not control the activation of grammars. Forinstance, voice-only grammars may be active when the inputmode isrestricted to DTMF. Those grammars would not be matched, however,because the voice input modality is not active.
universals	Platforms may optionally provide platform-specific universalcommand grammars, such as "help", "cancel", or "exit" grammars,that are always active (except in the case of modal inputitems - seeSection 3.1.4)and which generate specific events. Production-grade applications often need to define their ownuniversal command grammars, e.g., to increase applicationportability or to provide a distinctive interface. They specifynew universal command grammars with <link> elements. Theyturn off the default grammars with this property. Default catchhandlers are not affected by this property. The value "none" is the default, and means that all platformdefault universal command grammars are disabled. The value "all"turns them all on. Individual grammars are enabled by listingtheir names separated by spaces; for example, "cancel exithelp".
maxnbest	This property controls the maximum size of the"application.lastresult$" array; the array is constrained to beno larger than the value specified by 'maxnbest'. This propertyhas a minimum value of 1. The default value is 1.

Our last example shows several of these properties used atmultiple levels.

<?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd">  <!-- set default characteristics for page -->   <property name="audiofetchhint" value="safe"/>   <property name="confidencelevel" value="0.75"/>  <form>     <!-- override defaults for this form only -->     <property name="confidencelevel" value="0.5"/>     <property name="bargein" value="false"/>     <grammar src="address_book.grxml" type="application/srgs+xml"/>    <block>       <prompt> Welcome to the Voice Address Book </prompt>     </block>     <initial name="start">       <!-- override default timeout value -->       <property name="timeout" value="5s"/>       <prompt> Who would you like to call? </prompt>     </initial>     <field name="person">       <prompt>        Say the name of the person you would like to call.      </prompt>     </field>     <field name="location">       <prompt>        Say the location of the person you would like to call.      </prompt>     </field>     <field name="confirm">      <grammar type="application/srgs+xml" src="/grammars/boolean.grxml"/>      <!-- Use actual utterances to playback recognized words,               rather than returned slot values -->       <prompt>         You said to call <value expr="person$.utterance"/>         at <value expr="location$.utterance"/>.         Is this correct?       </prompt>       <filled>         <if cond="confirm">           <submit namelist="person location"            next="http://www.messagecentral.example.com/voice/make_call" />        </if>         <clear/>       </filled>     </field>   </form> </vxml>

6.4param element

The <param> element is used to specify values that arepassed to subdialogs or objects. It is modeled on the[HTML] <PARAM> element.Its attributes are:

Table 64: <param>Attributes
name	The name to be associatedwith this parameter when the object or subdialog is invoked.
expr	An expression thatcomputes the value associated with name.
value	Associates a literalstring value with name.
valuetype	One of data or ref, bydefault data; used to indicate to an object if the valueassociated with name is data or a URI (ref). This is not used for<subdialog> since values are always data.
type	The media type of theresult provided by a URI if the valuetype is ref; only relevantfor uses of <param> in <object>.

Exactly one of "expr" or "value" must be specified; otherwise,an error.badfetch event is thrown.

The use of valuetype and type is optional in general, althoughthey may be required by specific objects. When <param> iscontained in a <subdialog> element, the values specified byit are used to initialize dialog <var> elements in thesubdialog that is invoked. SeeSection 2.3.4 for details regardinginitialization of variables in subdialogs using<param>. When <param> is contained in an<object>, the use of the parameter data is specific to theobject that is being invoked, and is outside the scope of theVoiceXML specification.

Below is an example of <param> used as part of an<object>. In this case, the first two <param>elements have expressions (implicitly of valuetype="data"), thethird <param> has an explicit value, and the fourth is aURI that returns a media type of text/plain. The meaning of thisdata is specific to the object.

<object name="debit"  classid="method://credit-card/gather_and_debit"  data="http://www.recordings.example.com/prompts/credit/jesse.jar">   <param name="amount" expr="document.amt"/>   <param name="vendor" expr="vendor_num"/>   <param name="application_id" value="ADC5678-QWOO"/>   <param name="authentication_server"   value="http://auth-svr.example.com"    valuetype="ref"   type="text/plain"/> </object>

The next example illustrates <param> used with<subdialog>. In this case, two expressions are used toinitialize variables in the scope of the subdialog form.

Form with calling dialog

<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>   <subdialog name="result" src="http://another.example.com/#getssn">     <param name="firstname" expr="document.first"/>     <param name="lastname"  expr="document.last"/>     <filled>       <submit namelist="result.ssn"        next="http://myservice.example.com/cgi-bin/process"/>     </filled>   </subdialog> </form></vxml>

Subdialog in http://another.example.com

 <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"   xsi:schemaLocation="http://www.w3.org/2001/vxml    http://www.w3.org/TR/voicexml20/vxml.xsd"><form>   <var name="firstname"/>   <var name="lastname"/>   <field name="ssn">       <grammar src="http://grammarlib/ssn.grxml"        type="application/srgs+xml"/>      <prompt>        Please say Social Securityy number.      </prompt>       <filled>         <if cond="validssn(firstname,lastname,ssn)">           <assign name="status" expr="true"/>           <return namelist="status ssn"/>         <else/>           <assign name="status" expr="false"/>           <return namelist="status"/>         </if>       </filled>   </field> </form></vxml>

Using <param> in a <subdialog> is a convenient wayof passing data to a subdialog without requiring the use ofserver side scripting.

6.5Value Designations

Several VoiceXML parameter values follow the conventions usedin the W3C's Cascading Style Sheet Recommendation[CSS2].

6.5.1 Integers and Real Numbers

Real numbers and integers are specified in decimal notationonly. An integer consists of one or more digits "0" to "9". Areal number may be an integer, or it may be zero or more digitsfollowed by a dot (.) followed by one or more digits. Bothintegers and real numbers may be preceded by a "-" or "+" toindicate the sign.

6.5.2 Times

Time designations consist of a non-negative real numberfollowed by a time unit identifier. The time unit identifiersare:

ms: milliseconds
s: seconds

Examples include: "3s", "850ms", "0.7s", ".5s"and "+1.5s".

Appendices

Appendix A — Glossary of Terms

active grammar: A speech or DTMF grammar that is currently active. This isbased on the currently executing element, and the scope elementsof the currently defined grammars.
application: A collection ofVoiceXML documents that are taggedwith the same application name attribute.
ASR: Automatic speech recognition.
author: The creator of aVoiceXML document.
catch element: A <catch> block or one of its abbreviated forms.Certain default catch elements are defined by theVoiceXMLinterpreter.
control item: Aform item whose purpose is either to contain a blockof procedural logics (<block>) or to allow initial promptsfor a mixed initiative dialog (<initial>).
CSS W3C Cascading Style Sheet specification.: See[CSS2]
dialog: An interaction with the user specified in aVoiceXMLdocument. Types of dialogs includeforms andmenus.
DTMF (Dual Tone Multi-Frequency): Touch-tone or push-button dialing. Pushing a button on atelephone keypad generates a sound that is a combination of twotones, one high frequency and the other low frequency.
ECMAScript: A standard version of JavaScript backed by the EuropeanComputer Manufacturer's Association. See[ECMASCRIPT]
event: A notification "thrown" by theimplementationplatform,VoiceXML interpreter context,VoiceXMLinterpreter, or VoiceXML code. Events include exceptionalconditions (semantic errors), normal errors (user did not saysomething recognizable), normal events (user wants to exit), anduser defined events.
executable content: Procedural logic that occurs in <block>,<filled>, andevent handlers.
form: Adialog that interacts with theuser in ahighly flexible fashion with the computer and theusersharing the initiative.
FIA (Form Interpretation Algorithm): An algorithm implemented in aVoiceXML interpreterwhich drives the interaction between the user and a VoiceXML formor menu. SeeSection 2.1.6andAppendix C.
form item: An element of <form> that can be visited during formexecution: <initial>, <block>, <field>,<record>, <object>, <subdialog>, and<transfer>.
form item variable: A variable, either implicitly or explicitly defined,associated with eachform item in aform. If theform item variable is undefined, the form interpretationalgorithm will visit the form item and use it to interact withthe user.
implementation platform: A computer with the requisite software and/or hardware tosupport the types of interaction defined by VoiceXML.
input item: Aform item whose purpose is to input a input itemvariable. Input items include <field>, <record>,<object>, <subdialog>, and <transfer>.
language identifier: A language identifier labels information content as being ofa particular human language variant. Following the XMLspecification for language identification[XML], a legal language identifier is identifiedby an RFC 3066[RFC3066]code. A language code is required by RFC 3066. A country code orother subtag identifier is optional by RFC 3066.
link: A set of grammars that when matched by something theuser says or keys in, either transitions to a new dialogor document or throws an event in the current form item.
menu: Adialog presenting theuser with a set ofchoices and takes action on the selected one.
mixed initiative: A computer-human interaction in which either the computer orthe human can take initiative and decide what to do next.
JSGF: Java API Speech Grammar Format. A proposed standard forrepresenting speech grammars. See[JSGF]
object: A platform-specific capability with an interface availablevia VoiceXML.
request: A collection of data including: a URI specifying a documentserver for the data, a set of name-value pairs of data to beprocessed (optional), and a method of submission for processing(optional).
script: A fragment of logic written in a client-side scriptinglanguage, especiallyECMAScript, which is a scriptinglanguage that must be supported by anyVoiceXMLinterpreter.
session: A connection between auser and animplementationplatform, e.g. a telephone call to a voice response system.One session may involve the interpretation of more than oneVoiceXML document.
SRGS (Speech Recognition Grammar Specification): A standard format for context-free speech recognitiongrammars being developed by the W3C Voice Browser group. BothABNF and XML formats are defined[SRGS].
SSML (Speech Synthesis Markup Language): A standard format for speech synthesis being developed by theW3C Voice Browser group[SSML].
subdialog: A VoiceXML dialog (or document) invoked from the currentdialog in a manner analogous to function calls.
tapered prompts: A set of prompts used to vary a message given to the human.Prompts may be tapered to be more terse with use (fieldprompting), or more explicit (help prompts).
throw: An element that fires anevent.
TTS: text-to-speech; speech synthesis.
user: A person whose interaction with animplementationplatform is controlled by aVoiceXML interpreter.
URI: Uniform Resource Indicator.
URL: Uniform Resource Locator.
VoiceXML document: An XML document conforming to the VoiceXMLspecification.
VoiceXML interpreter: A computer program that interprets aVoiceXML documentto control animplementation platform for the purpose ofconducting an interaction with auser.
VoiceXML interpreter context: A computer program that uses aVoiceXML interpreter tointerpret aVoiceXML Document and that may also interactwith theimplementation platform independently of theVoiceXML interpreter.
W3C: World Wide Web Consortiumhttp://www.w3.org/

Appendix B — VoiceXML Document TypeDefinition

The VoiceXML DTD is located athttp://www.w3.org/TR/voicexml20/vxml.dtd.

Due to DTD limitations, the VoiceXML DTD does not correctlyexpress that the <metadata> element can contain elementsfrom other XML namespaces.

Note: the VoiceXML DTD includes modified elements from theDTDs of the Speech Recognition Grammar Specification 1.0[SRGS] and the Speech SynthesisMarkup Language 1.0[SSML].

Appendix C — Form InterpretationAlgorithm

The form interpretation algorithm (FIA) drives the interactionbetween the user and a VoiceXML form or menu. A menu can beviewed as a form containing a single field whose grammar andwhose <filled> action are constructed from the<choice> elements.

The FIA must handle:

Form initialization.
Prompting, including the management of the prompt countersneeded for prompt tapering.
Grammar activation and deactivation at the form and form itemlevels.
Entering the form with an utterance that matched one of theform's document-scoped grammars while the user was visitinga different form or menu.
Leaving the form because the user matched another form, menu,or link's document-scoped grammar.
Processing multiple field fills from one utterance, includingthe execution of the relevant <filled> actions.
Selecting the next form item to visit, and then processingthat form item.
Choosing the correct catch element to handle any events thrownwhile processing a form item.

First we define some terms and data structures used in theform interpretation algorithm:

active grammar set: The set of grammars active during a VoiceXML interpretercontext's input collection operation.
utterance: A summary of what the user said or keyed in, including thespecific grammar matched, and a semantic result consisting of aninterpretation structure or, where there is no semanticinterpretation, the raw text of the input (seeSection 3.1.6). An exampleutterance might be: "grammar 123 was matched, and the semanticinterpretation is {drink: "coke" pizza: {number: "3" size:"large"}}".
execute: To execute executable content – either a block, afilled action, or a set of filled actions. If an event is thrownduring execution, the execution of the executable content isaborted. The appropriate event handler is then executed, and thismay cause control to resume in a form item, in the next iterationof the form's main loop, or outside of the form. If a<goto> is executed, the transfer takes place immediately,and the remaining executable content is not executed.

Here is the conceptual form interpretation algorithm. The FIAcan start with no initial utterance, or with an initial utterancepassed in from another dialog:

////Initialization Phase//foreach ( <var>, <script> and form item, in document order )   if ( the element is a <var> )     Declare the variable, initializing it to the value of     the "expr" attribute, if any, or else to undefined.   else if ( the element is a <script> )     Evaluate the contents of the script if inlined or else     from the location specified by the "src" attribute.   else if ( the element is a form item )     Create a variable from the "name" attribute, if any, or     else generate an internal name.  Assign to this variable     the value of the "expr" attribute, if any, or else undefined.            foreach ( input item and <initial> element )                 Declare a prompt counter and set it to 1.if ( user entered this form by speaking to its      grammar while in a different form){    Enter the main loop below, but start in    the process phase, not the select phase:    we already have a collection to process.}////Main Loop: select next form item and execute it.//while ( true ){    //    //Select Phase: choose a form item to visit.    //    if ( the last main loop iteration ended              with a <goto nextitem> )        Select that next form item.    else if (there is a form item with an              unsatisfied guard condition )        Select the first such form item in document order.    else        Do an <exit/> -- the form is full and specified no transition.    //    //Collect Phase: execute the selected form item.    //    // Queue up prompts for the form item.    unless ( the last loop iteration ended with            a catch that had no <reprompt>,         and the active dialog was not changed )    {        Select the appropriate prompts for an input item or <initial>.        Queue the selected prompts for play prior to        the next collect operation.        Increment an input item's or <initial>'s prompt counter.    }    // Activate grammars for the form item.    if ( the form item is modal )        Set the active grammar set to the form item grammars,        if any. (Note that some form items, e.g. <block>,        cannot have any grammars).    else        Set the active grammar set to the form item        grammars and any grammars scoped to the form,        the current document, and the application root        document.    // Execute the form item.    if ( a <field> was selected )        Collect an utterance or an event from the user.    else if ( a <record> was chosen )        Collect an utterance (with a name/value pair        for the recorded bytes) or event from the user.    else if ( an <object> was chosen )        Execute the object, setting the <object>'s        form item variable to the returned ECMAScript value.    else if ( a <subdialog> was chosen )        Execute the subdialog, setting the <subdialog>'s        form item variable to the returned ECMAScript value.    else if ( a <transfer> was chosen )        Do the transfer, and (if wait is true) set the        <transfer> form item variable to the returned        result status indicator.    else if ( an <initial> was chosen )        Collect an utterance or an event from the user.    else if ( a <block> was chosen )    {        Set the block's form item variable to a defined value.        Execute the block's executable context.    }    //    //Process Phase: process the resulting utterance or event.    //    Assign the utterance and other information about the last    recognition to application.lastresult$.            // Must have an utterance    if ( the utterance matched a grammar belonging to a <link> )      If the link specifies an "next" or "expr" attribute,      transition to that location.  Else if the link specifies an      "event" or "eventexpr" attribute, generate that event.    else if ( the utterance matched a grammar belonging to a <choice> )      If the choice specifies an "next" or "expr" attribute,      transition to that location.  Else if the choice specifies      an "event" or "eventexpr" attribute, generate that event.    else if ( the utterance matched a grammar from outside the current              <form> or <menu> )    {      Transition to that <form> or <menu>, carrying the utterance      to the new FIA.    }    // Process an utterance spoken to a grammar from this form.    // First copy utterance result property values into corresponding    // form item variables.    Clear all "just_filled" flags.    if ( the grammar is scoped to the field-level ) {       // This grammar must be enclosed in an input item.  The input item       // has an associated ECMAScript variable (referred to here as the input       // item variable) and slot name.       if ( the result is not a structure )         Copy the result into the input item variable.       elseif ( a top-level property in the result matches the slot name                or the slot name is a dot-separated path matching a                subproperty in the result )         Copy the value of that property into the input item variable.       else         Copy the entire result into the input item variable       Set this input item's "just_filled" flag.    }    else {       foreach ( property in the user's utterance )       {          if ( the property matches an input item's slot name )          {             Copy the value of that property into the input item's form             item variable.             Set the input item's "just_filled" flag.          }       }    }    // Set all <initial> form item variables if any input items are filled.    if ( any input item variable is set as a result of the user utterance )        Set all <initial> form item variables to true.    // Next execute any triggered <filled> actions.     foreach ( <filled> action in document order )    {        // Determine the input item variables the <filled> applies to.        N = the <filled>'s "namelist" attribute.        if ( N equals "" )        {           if ( the <filled> is a child of an input item )             N = the input item's form item variable name.           else if ( the <filled> is a child of a form )             N = the form item variable names of all the input                 items in that form.        }        // Is the <filled> triggered?        if ( any input item variable in the set N was "just_filled"               AND  (  the <filled> mode is "all"                           AND all variables in N are filled                       OR the <filled> mode is "any"                           AND any variables in N are filled) )             Execute the <filled> action.         If an event is thrown during the execution of a <filled>,             event handler selection starts in the scope of the <filled>,         which could be an input item or the form itself.    }    // If no input item is filled, just continue.}

During FIA execution, events may be generated at severalpoints. These events are processed differently depending on whichphase is active.

Before a form item is selected (i.e. during the Initializationand Select phases), events are generated at the dialog level. Thecorresponding catch handler is located and executed. If the catchdoes not result in a transition from the current dialog, FIAexecution will terminate.

Similarly, events triggered after a form item is selected(i.e. during the Collect and Process phases) are usuallygenerated at the form item level. There is one exception: eventstriggered by a dialog level <filled> are generated at thedialog level. The corresponding catch handler is located andexecuted. If the catch does not result in a transition, thecurrent FIA loop is terminated and Select phase is reentered.

Appendix D — Timing Properties

The various timing properties for speech and DTMF recognitionwork together to define the user experience. The ways in whichthese different timing parameters function are outlined in thetiming diagrams below. In these diagrams, the start for wait ofDTMF input, or user speech both occur at the time that the lastprompt has finished playing.

D.1. DTMFGrammars

DTMF grammars use timeout, interdigittimeout, termtimeout andtermchar as described inSection6.3.3 to tailor the user experience. The effects of these areshown in the following timing diagrams.

timeout, No Input Provided

The timeout parameter determines when the <noinput>event is thrown because the user has failed to enter any DTMF(Figure 12). Once the first DTMF has been entered, thisparameter has no further effect.

Figure 12: Timing diagram for timeout when no input provided.

interdigittimeout, Grammar is Not Ready toTerminate

In Figure 13, the interdigittimeout determines when thenomatch event is thrown because a DTMF grammar is not yetrecognized, and the user has failed to enter additional DTMF.

Figure 13: Timing diagram for interdigittimeout, grammar is notready to terminate.

interdigittimeout, Grammar is Ready toTerminate

The example below shows the situation when a DTMF grammarcould terminate, or extend by the addition of more DTMF input,and the user has elected not to provide any further input.

Figure 14: Timing diagram for interdigittimeout, grammar is readyto terminate.

termchar and interdigittimeout, Grammar CanTerminate

In the example below, a termchar is non-empty, and is enteredby the user before an interdigittimeout expires, to signify thatthe users DTMF input is complete; the termchar is not included aspart of the recognized value.

Figure 15: Timing diagram for termchar and interdigittimeout,grammar can terminate.

termchar Empty When Grammar MustTerminate

In the example below, the entry of the last DTMF has broughtthe grammar to a termination point at which no additional DTMF isexpected. Since termchar is empty, there is no optionalterminating character permitted, thus the recognition ends andthe recognized value is returned.

Figure 16: Timing diagram for termchar empty when grammar mustterminate.

termchar Non-Empty and termtimeout WhenGrammar Must Terminate

In the example below, the entry of the last DTMF has broughtthe grammar to a termination point at which no additional DTMF isallowed by the grammar. If the termchar is non-empty, then theuser can enter an optional termchar DTMF. If the user fails toenter this optional DTMF within termtimeout, the recognition endsand the recognized value is returned. If the termtimeout is 0s(the default), then the recognized value is returned immediatelyafter the last DTMF allowed by the grammar, without waiting forthe optional termchar. Note: the termtimeout applies onlywhen no additional input is allowed by the grammar; otherwise,the interdigittimeout applies.

Figure 17: Timing diagram for termchar non-empty and termtimeoutwhen grammar must terminate.

termchar Non-Empty and termtimeout WhenGrammar Must Terminate

In this example, the entry of the last DTMF has brought thegrammar to a termination point at which no additional DTMF isallowed by the grammar. Since the termchar is non-empty, the userenters the optional termchar within termtimeout causing therecognized value to be returned (excluding the termchar).

Figure 18: Timing diagram for termchar non-empty when grammarmust terminate.

Invalid DTMF Input

While waiting for the first or additional DTMF, threedifferent timeouts may determine when the user's input isconsidered complete. If no DTMF has been entered, the timeoutapplies; if some DTMF has been entered but additional DTMF isvalid, then the interdigittimeout applies; and if no additionalDTMF is legal, then the termtimeout applies. At each point, theuser may enter DTMF which is not permitted by the activegrammar(s). This causes the collected DTMF string to be invalid.Additional digits will be collected until either the termcharis pressed or the interdigittimeout has elapsed. A nomatch eventis then generated.

D.2. SpeechGrammars.

Speech grammars use timeout, completetimeout, andincompletetimeout as described inSection 6.3.4 andSection 6.3.2 to tailor the user experience. Theeffects of these are shown in the following timing diagrams.

timeout When No Speech Provided

In the example below, the timeout parameter determines whenthe noinput event is thrown because the user has failed tospeak.

Figure 19: Timing diagram for timeout when no speechprovided.

completetimeout With Speech GrammarRecognized

In the example above, the user provided a utterance that wasrecognized by the speech grammar. After a silence period ofcompletetimeout has elapsed, the recognized value isreturned.

Figure 20: Timing diagram for completetimeout with speech grammarrecognized.

incompletetimeout with Speech GrammarUnrecognized

In the example above, the user provided a utterance that isnot as yet recognized by the speech grammar but is the prefix ofa legal utterance. After a silence period of incompletetimeouthas elapsed, a nomatch event is thrown.

Figure 21: Timing diagram for incompletetimeout with speechgrammar unrecognized.

Appendix E — AudioFile Formats

VoiceXML requires that a platform support the playing andrecording audio formats specified below.

Table 65: Audio Formats Which PlatformsMust Support
Audio Format	Media Type
Raw (headerless) 8kHz 8-bit monomu-law [PCM] single channel. (G.711)	audio/basic (from[RFC1521])
Raw (headerless) 8kHz 8 bit monoA-law [PCM] single channel. (G.711)	audio/x-alaw-basic
WAV (RIFF header) 8kHz 8-bit monomu-law [PCM] single channel.	audio/x-wav
WAV (RIFF header) 8kHz 8-bit monoA-law [PCM] single channel.	audio/x-wav

The 'audio/basic' mime type is commonly used with the 'au'header format as well as the headerless 8-bit 8Khz mu-law format.If this mime type is specified for recording, the mu-law formatmust be used. For playback with the 'audio/basic' mime type,platforms must support the mu-law format and may support the 'au'format.

Appendix F —Conformance

This section is Normative.

F1. ConformingVoiceXML Document

Aconforming VoiceXML document is a well-formed[XML] document that requiresonly the facilities described as mandatory in this specification.Such a document must meet all of the following criteria:

The document must conform to the constraints expressed in theVoiceXML Schema (AppendixO).
The root element of the document must be <vxml>.
The <vxml> element must include a "version" attributewith the value "2.0".
The <vxml> element must designate the VoiceXMLnamespace. This can be achieved by declaring an "xmlns"attribute or an attribute with an "xmlns" prefix[XMLNAMES]. The namespacefor VoiceXML is defined to be http://www.w3.org/2001/vxml.
It is recommended that the <vxml> element alsoindicate the location of the VoiceXML schema (seeAppendix O) via thexsi:schemaLocation attribute from[SCHEMA1]:
```
xsi:schemaLocation="http://www.w3.org/2001/vxml http://www.w3.org/TR/voicexml20/vxml.xsd"
```
Although such indication is not required, to encourage it thisdocument provides such indication on all of the examples.
There may be a DOCTYPE declaration in the document prior tothe root element. If present, the public identifier included inthe DOCTYPE declaration must reference the VoiceXML DTD (Appendix B) using its FormalPublic Identifier.
```
<!DOCTYPE vxml      PUBLIC "-//W3C//DTD VOICEXML 2.0//EN"      "http://www.w3.org/TR/voicexml20/vxml.dtd">
```
The system identifier may be modified appropriately.
The DTD subset must not be used to override any parameterentities in the DTD.

Here is an example of a Conforming VoiceXML document:

<?xml version="1.0" encoding="UTF-8"?><vxml version="2.0" xmlns="http://www.w3.org/2001/vxml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/vxml  http://www.w3.org/TR/voicexml20/vxml.xsd"> <form>  <block>hello</block> </form></vxml>

Note that in this example, the recommended "xmlns:xsi" and"xsi:schemaLocation" attributes are included as is an XMLdeclaration. An XML declaration like the one above is notrequired in all XML documents. VoiceXML document authors arestrongly encouraged to use XML declarations in all theirdocuments. Such a declaration is required when the characterencoding of the document is other than the default UTF-8 orUTF-16 and no encoding was determined by a higher-levelprotocol.

The VoiceXML language or these conformance criteria provide nodesignated size limits on any aspect of VoiceXML documents. Thereare no maximum values on the number of elements, the amount ofcharacter data, or the number of characters in attributevalues.

F.2 UsingVoiceXML with other namespaces

The VoiceXML namespace may be used with other XML namespacesas per[XMLNAMES],although such documents are not strictly conforming VoiceXMLdocuments as defined above. Future work by W3C will address waysto specify conformance for documents involving multiplenamespaces.

F.3 ConformingVoiceXML Processors

A VoiceXML processor is a user agent that can parse andprocess Conforming VoiceXML documents.

In aConforming VoiceXML Processor, the XML parsermust be able to parse and process all well-formed XML constructsdefined within[XML] and[XMLNAMES]. It is notrequired that a Conforming VoiceXML processor use a validatingparser.

AConforming VoiceXML Processor must be aConforming Speech Synthesis Markup Language Processor[SSML] and aConforming XMLGrammar Processor[SRGS]except for differences described in this document. If a syntaxerror is detected processing a grammar document, then an"error.badfetch" event must be thrown.

AConforming VoiceXML Processor must support thesyntax and semantics of all VoiceXML elements as described inthis document. Consequently, aConforming VoiceXMLProcessor must not throw an'error.unsupported.<element>' for any VoiceXML elementwhich must be supported when processing aConforming VoiceXMLDocument.

When aConforming VoiceXML Processor encounters aConforming VoiceXMLDocument with non-VoiceXML elements or attributes which areproprietary, defined only in earlier versions of VoiceXML, ordefined in a non-VoiceXML namespace, and which cannot beprocessed, then it must throw an "error.badfetch" event.

When aConforming VoiceXML Processor encounters adocument with a root element designating a namespace other thanVoiceXML, its behavior is undefined.

There is, however, no conformance requirement with respect toperformance characteristics of the VoiceXML Processor.

Appendix G—Internationalization

VoiceXML is an application of[XML] and thus supports[UNICODE] which defines a standard universalcharacter set.

Additionally, VoiceXML provides a mechanism for precisecontrol of the input and output languages via the use of"xml:lang" attribute. This facility provides:

The ability to specify the input and output languageoverriding the VoiceXML Processor default language
The ability to produce multi-language output
The ability to interpret input in a language different fromthe output language(s)

Appendix H—Accessibility

Voice is central to, but not the limit of, VoiceXMLapplications. While speaking and listening will be the mostwidely used techniques in most circumstances and for most usersto interact with VoiceXML applications, some users may be unableto speak and/or listen because of temporary (or permanent)circumstances. Persons with disabilities, particularly those withspeech and/or hearing impairments, may need to interact withVoiceXML applications in other ways:

Hearing impaired users may read text on a display or (if theyare also blind) read Braille text by touch. In order to supportspecial devices for use by persons with speech and/or hearingimpairments, developers are encouraged to provide a textequivalent of each audio prompt within the audio tag: For example
```
<audio src="greetings.wav">Greetings</audio>
```
would normally replay the greetings.wav audio file. However, ifthe VoiceXML interpreter Context has detected that the user isviewing the interaction on a display or is touching Brailleoutput, then the text "Greetings" is rendered by the display orBraille output device.
Speaking impaired users may enter coded sequences that areconverted into alphabetic text before feeding them into theVoiceXML platform. The conversion might be accomplished by aspecial hardware attachment to a telephone that convertskeystrokes from, as examples, a QWERTY keyboard to alphabetictext. The conversion might also be accomplished by software thattranslates sequences of DTMF tones from a 12-key telephone keypadinto alphabetic text.

Providing alternative paths to information delivery and userinput is central to all W3C technologies intended for use bypeople. While initially authored to make on screen contentaccessible, the following accessibility guidelines published byW3C's Web Accessibility Initiative (WAI) also apply toVoiceXML.

Web Content Accessibility Guidelines 1.0[WAI-WEBCONTENT]explains in detail how to make a Web site accessible for peoplewith a variety of disabilities.
Authoring Tool Accessibility Guidelines 1.0[ATAGIO]. For softwaredevelopers, explains how to make a variety of authoring toolssupport the production of accessible Web content, and also how tomake the software itself accessible.
User Agent Accessibility Guidelines 1.0[UAAGIO]. For software developers, explains howto make accessible browsers, multimedia players, and assistivetechnologies that interface with these.
XML Accessibility Guidelines[XAG]. For developers of XML-based applicationsexplains how to ensure that XML-based applications supportaccessibility.

Additional guidelines for enabling persons with disabilitiesto access VoiceXML applications include the following:

Reuse navigation structures that are highly usable andleverage learning across multiple applications--for example, thenavigational techniques for the ANSI/NISO Z39.86-2002 DigitalTalking book Standard, http://www.loc.gov/nls/z3986/.
Each element in which an event can occur should specify catchelements, including one with a fail-soft or recoveryfunctionality.
Enable users to control the length of time before timeout,the speaking rate of synthesized speech, and other such variablesthat provide a little extra time to respond or complete an inputaction, particularly when the VoiceXML interpreter Context hasdetected that the user is using an ancillary device instead oflistening or speaking. These are especially useful for users withcognitive disabilities.
Advertise alternative modes through which equivalent serviceis available, including transfer to a human operator, texttelephone service, etc., or the availability of the sameinformation via the World Wide Web.

Appendix I —Privacy

A future version of VoiceXML may specify criteria by which aVoiceXML Processor safeguards the privacy of personal data.

Appendix J —Changes from VoiceXML 1.0

The following is a summary of the differences between VoiceXML2.0 and VoiceXML 1.0[VOICEXML-1.0].

Developers of VoiceXML 1.0 applications should pay particularattentions to the changes incompatible with VoiceXML 1.0specified inObsoleteElements andIncompatibly Modified Elements.

New Elements

<log> to specify a debug message (5.3.13)
<metadata> as a means of specifying metadatainformation using a schema (6.2,6.2.2)

Obsolete Elements

<dtmf> superceded by <grammar> with "mode=dtmf"(3.1.2)
<emp>, <div>, <pros>, and <sayas>JSML elements have been replaced by corresponding elements inSpeech Synthesis Markup Language[SSML] (4.1.1)

Incompatibly ModifiedElements

changed "lang" to "xml:lang" in <vxml> (1.5.1).
Added required attribute xmlns in <vxml> (1.5.1)
Replaced "base" attribute with "xml:base" in <vxml> (1.5.1)
Changed and clarified that if an implementation does notsupport a specific object, it throwserror.unsupported.objectname. The event error.unsupported.formatis not thrown for unsupported object types (2.1.2.1,2.3.5,5.2.6)
A field's type attribute does not indicate an implicit<say-as> class to use when speaking out the field's value.An explicit <say-as> must be used instead (2.1.4,Appendix P)
added "accept" attribute to <menu> and <choice>(2.2.1,2.2.2)
An error.badfetch (previously, error.semantic) is thrown if a<menu>'s dtmf attribute is set to true and it has choiceswith DTMF sequences specified as something other than "*", "#",or "0" (2.2.1)
Removed required support for access to builtin resources suchas grammars, and 'builtin' is treated as a platform-specific URIscheme for accessing resources (2.3.1.2,Appendix P)
Added 'accept' attribute to <option> and modifieddescription of 'choice phrase' in grammar generation (2.3.1.3,2.2.5)
removed "modal" attribute of <subdialog> (2.3.4)
removed "fetchaudio" attribute from <object>(2.3.5)
Removed capability of <value> to playback a record.Only <audio> can be used to playback recording (2.3.6,4.1.3,4.1.4)
If a platform supports simultaneous speech recognition andrecording, then spoken input can match an active non-localspeech grammar. If local speech grammars are specified, theyare treated as inactive (i.e. they are ignored), even if theplatform supports simultaneous recognition and recording(2.3.6)
replaced "phone" URI schema with "tel" schema in<transfer> dest (and destexpr) attribute (2.3.7)
during bridged transfer, while bargein operates normally, thebargeintype is fixed to "hotword", grammar activation is modal(only local grammars are activated), and transferaudio beginsplaying at the point the outgoing call begins (2.3.7)
removed notational equivalence between <filled> in afield and a form-level <filled> triggering on that field(2.4)
The treatment of the 'type' in the <grammar> elementfollows standard W3C practise. If a media type is returned by theprotocol, then it is authoritative: it cannot be override by theVoiceXML platform even if it does not match the actual media typeof the resource or cannot be processed as a grammar. The value ofthe 'type' attribute may be used to influence content typenegotiation (in HTTP 1.1 for example) and, only if no media typeis returned by the protocol, becomes the authorative media typefor the resource (3.1.1.2,3.1.1.4)
TTS content in <choice>, <prompt>,<enumerate>, and <audio> replaced with definition inSpeech Synthesis Markup Language[SSML])
in <audio>, if the audio file cannot be played and thecontent of the element is empty, no audio is played and no errorevent is thrown (4.1.3)
removed "class", "mode" and "recsrc" attributes from<value> (4.1.4)
removed standard session variable "session.uui". Addednew generic session variables "session.connection.protocol.name"and "session.connection.aai", which would provide thisinformation and more (5.1.4)
Replaced session.telephone variable space withsession.connection space which is not protocol-specific and moreextensible. Corresponding error names also changed. (5.1.4)
The mechanism by which ECMAScript objects in the namelist of<submit> are submitted is not currently defined butreserved for future definition. Application developers mayexplicit submit properties of the object instead of the objectitself (5.3.8)
removed "caching" attribute (6.1)
added "maxage" and "maxstale" attributes (6.1)
removed "stream" as value of fetchhint property (6.1.1,6.3.5)
removed "caching" from fetching properties (6.3.5).
Platform-specific universal command grammars are optional (6.3.6)

ModifiedElements

Platforms may make a distinction between field and utterancelevel confidence: i.e. field$.confidence andapplication.lastresult$.confidence may differ (2.3.1,3.1.6.4,5.1.5,6.3.2 )
Added 'srcexpr' attribute to <subdialog>(2.3.4)
added "maxtime" shadow variable to <record>(2.3.6)
added "transferaudio" attribute to <transfer>; added"maxtimedisconnect", and "unknown" as values of bridge transfers;added more error.connection events (2.3.7)
added "aai" and "aaiexpr" attributes to allow data passingwith <transfer> (2.3.7)
added 'dtmf' attribute to <link> (2.5).
the XML Form of the W3C Speech Recognition GrammarSpecification[SRGS] must besupported in <grammar> (3.1).
added "weight", "mode", "xml:lang", "root" and "version"attributes to <grammar> (3.1).
added "xml:lang" attribute to <prompt> (4.1).
added "bargeintype" attribute to <prompt> with values"speech" and "hotword" (4.1).
added "expr" attribute to <audio> element (4.1.3)
added application variable "application.lastresult$"describing last recognition result, including n-best (5.1.5).
added "event", "eventexpr", "message" and "messageexpr" asattributes of <throw>, <choice>, <link>, and<return> (5.2.1).
added "_event" variable to <catch> (5.2.2).
<catch> is no longer allowed to specify an eventattribute with an empty string value (5.2.2,5.2.4)
added "error.badfetch.http.nnn" as pre-defined error type (5.2.6)
added "error.badfetch.protocol.<response code>" aspre-defined error type (5.2.6)
added "maxspeechtimeout" event (5.2.6)
added "error.unsupported.language" pre-defined error type (5.2.6)
added required support for "multipart/form-data" value as"enctype" of <submit> (5.3.8)
<script> can occur in <form> element (5.3.12)
Failure to retrieve fetchaudio from its URI does not resultin a badfetch event being thrown; instead, no audio is playedduring the fetch (6.1.1)
HTTP is mandatory (6.1.4)
added "maxspeechtimeout" property (6.3.2)
Platform support for the completetimeout property isoptional. However, a platform which does not support thisproperty must use the maximum of the completetimeout andincompletetimeout values as the value for the incompletetimeout,and must document it (6.3.2)
added "bargeintype" property (6.3.4)
added "fetchaudiodelay" and "fetchaudiominimum" to fetchproperties (6.3.5)
added "maxnbest" session property (6.3.6)
added "universals" property (with default "none") (6.3.6).
Added defaults for fetching attributes, as well as "maxage"and "maxstale" fetching attributes to: <choice>,<subdialog>, <object>, <link>, <grammar>,<audio>, <goto>, <submit> and<script>

Clarifications

Clarification how grammar results are mapped into VoiceXMLincluding: notion of "input items" for form input which acceptinput; only input items can be filled as a result of a form-levelgrammar match; field-level grammar matches cannot fill inputitems other than the current field; clarified that <object>could be filled and trigger filled actions; added a designprinciple for semantic mapping and effects on lastresult$, shadowvariables and process phase in the FIA (1.2.4,2.1.4,2.1.5,2.1.6.2.3,2.2,2.3.1,2.3.1.3,2.3.5,2.3.6,2.3.7.2,2.4,2.5,3.1.1,3.1.6,3.1.6.1,3.1.6.2,3.1.6.3,3.1.6.4,Appendix C).
if no audio input or output resource is available, anerror.noresource must be thrown (1.2.5,5.2.6)
Replaced misleading term 'field' item with 'form' or 'input'item as appropriate (1.3.1,2.1.6.2.2,2.3,2.3.3,2.3.5,3.1.6.1,4.1.3,5.1.1,5.1.3,6.3)
Clarified that application-level grammars may remain activefor the duration of the application subject to the grammaractivation rules inSection3.1.4 (1.3.3)
definitions of, and transitions between, root and leafdocument (1.5.2).
referencing application root document and its grammars (1.5.2).
When a subdialog is invoked with only a fragment identifier,the root and leaf pages remain unchanged and these pages are usedto initialize root and leaf contexts (1.5.2)
In a root to root transition, root context initialization isdetermined by the caching policy even when the current and targetapplications have the same name (1.5.2)
Clarification of URI transitions, especially fragmentidentifiers, in relation to RFC2396 (1.5.2,2.3.4,5.3.7,5.3.8,6.1.1)
Clarification of how root documents are treated inmulti-document applications and the benefits of using rootdocuments (1.5.2)
An error.badfetch is thrown when a document references anon-existent root document, and that an error.semantic is thrownif it references a root document which itself references a rootdocument (1.5.2)
<subdialog> transferring control to another<subdialog> and to another dialog via <goto> (1.5.3).
Added section describing final processing state when there isno longer a connection between the interpreter and the user.Removed description of final processing in <catch>. (1.5.4,5.2.2)
scope specified on individual form grammars takes precedenceover the default grammar scope in a <form> (2.1).
behaviour when executing unsupported <object> instances(2.1.2.1,2.3.5).
In <object>, when a platform does not support aparticular object, an error.unsupported.objectname event isthrown where 'objectname' is a fixed string and isnotsubstituted with the name of the particular object. In general,the substitutable components of an event name are indicated byitalics (e.g.object in error.unsupported.object)(2.1.2.1,2.3.5,5.2.6 )
If a platform does not support a specific <object>,then error.unsupported.objectname is thrown (2.1.2)
Multiple prompts in a field do not need to have countattributes. One or more prompts in a field are queued forplayback according to the prompt selection algorithm in Section4.1.6 (2.1.4)
effect of <goto nextitem> on form item (2.1.5).
no variables, conditions or counters are reset when using< goto nextitem> (2.1.5).
Clarified that mixed initiative dialogs require forms withform-level grammars and that there are many authoring styles formixed initiative including using <initial> and condattributes on <field> elements (2.1.5)
<goto nextitem> forces an immediate transfer to thespecified form item, even if any cond attribute present on theform item evaluates to "false". (2.1.5.1)
behaviour of <transfer>, <subdialog> and<object> with audio playback in collect phase (2.1.6).
Event handler selection in FIA process phase and<filled> (2.1.6.2).
Clarified that when errors raised in the select or collectphases of the FIA result in an event being thrown, the FIA movesdirectly to the process phase (2.1.6.2,2.1.6.2.1,2.1.6.2.3)
When an error is thrown in executable content, no subsequentexecutable elements in the procedural block are executed and, ifthere is no explicit transfer of control, an implicit<exit> is performed (2.1.6.2.1,5.3)
enumerated executable context elements which cause executionto terminate (2.1.6.2.3).
<reprompt> does not terminate the FIA (2.1.6.2.3).
Clarified that 'false' is the default value of the dtmfattribute of <menu> (2.2.1)
Clarified specification and behavior of mutually exclusiveattributes and child content (2.2.2,2.3.4,2.3.7,2.5,3.1.1.4,4.1.3,5.2.1,5.3.7,5.3.8,5.3.9,5.3.10,5.3.12,6.4)
In <menu> it is a semantic error if dtmf="true" but<choice>s have explicitly specified dtmf other than 0,* and#. If more than 9 choices without specified dtmf, then no dtmf isautomatically assigned (no dtmf input can match the choice) butno error is generated (2.2.3)
use of <enumerate> (2.2.4,2.3.1)
<grammar> overrides automatically generated grammars in<choice> (2.2.2).
<choice> expr evaluates to URI to transition to (2.2.2).
<choice> event handler without control transition,causes menu to be re-executed (2.2.2).
DTMF sequences specified in the dtmf attributes in<choice>, <option> and <link> are equivalent tosimple DTMF grammars where DTMF properties apply to recognitionof the sequence. However, unlike grammars, whitespace is optionalin DTMF sequences (2.2.2,2.3.1.3,2.5)
Clarified that DTMF and speech grammars, but not grammarfragments, are allowed in <choice> (2.2.2)
For <enumerate>, if no DTMF sequence is assigned to thechoice element, or if a <grammar> element is specified in<choice>, then the _dtmf variable is assigned theECMAScript undefined value (2.2.4)
For <enumerate>, the value of _dtmf is a normalizedrepresentation of the dtmf sequence (i.e. single whitespacebetween DTMF tokens) (2.2.4)
Specification of approximate grammar generation in<menu> and <choice> (2.2.5)
A form item is executed if it is not filled and its condattribute is not specified or evaluates to true (2.3,2.3.1)
Reorganized overview of form items to clarify whichcharacteristics apply to which form items. Also indicated thatthe <initial> form item can contain <property> and<catch> elements (2.3)
Evaluation of an 'cond' expression takes place afterconversion to boolean. This affects the 'cond' attribute in formitems <field>, <block>, <initial>,<subdialog>, <object>, <record>, and<transfer> (2.3);<prompt> (4.1); and<catch> (5.2.2,5.2.4)
Clarified that shadow variables are writeable and can bemodified by the application. Changed 'application.lastresult$' sothat it is also writeable and can be modified by the application(2.3,5.1.5 )
assignment of field variable when DTMF attribute defined (2.3.1).
The name of a field must be unique amongst form item names inits form. Variables declared in a <script> element aredeclared in the scope of the containing element of the<script> element (2.3.1,5.3.12)
form item variable names must respect ECMAScript variablenaming conventions (2.3.1,5.1).
If a specified builtin type of <field> is not supportedby the platform, an error.unsupported.builtin event is thrown. Ifa platform does support builtin types, then it must support allthe builtin types in a given language (2.3.1,5.2.6,Appendix P)
use of DTMF and speech grammars with "builtin:" URI scheme(2.3.1.2).
string returned when DTMF input when no "value" or CDATAspecified in <option> (2.3.1.3).
<option> and <grammar> can be used simultaneouslyto specify grammars in a <field> (2.3.1.3).
In an <option>, if neither CDATA content nor a dtmfsequence is specified, then the default assignment for the valueattribute is undefined and the field's form item variable is notfilled (2.3.1.3)
In an <option>, the dtmf attribute is optional. if novalue is specified for the dtmf attribute, then no DTMF sequenceis associated with the option and hence it cannot be matched byDTMF input (2.3.1.3)
Normal grammar scoping rules apply when visiting<initial>; in particular, no input item grammars areactive (2.3.3)
Clarified that a form allows multiple <initial>elements, and how they are selected for execution (2.3.3,Appendix C)
scope of variables in <subdialog> (2.3.4).
<subdialog> context is independent of its callingcontext (variable instances are not shared) but its contextfollows normal scoping rules for grammars, events, and variables(2.3.4).
in <subdialog> use "expr" attribute to set variable ifno corresponding <param> specified (2.3.4).
clarified description of subdialog's execution context (2.3.4)
Clarification of how <return> passes data to itscalling dialog in <subdialog> (2.3.4,5.3.10)
Variables in subdialogs are matched to parameters by name andin document order; parameter values are evaluated in the contextof the <param> element (2.3.4)
An error.badfetch is thrown when an invalid transition isattempted in <subdialog>, <goto> and <submit> .The scope in which errors are handled during transitions isplatform-dependent (2.3.4,5.3.7,5.3.8)
Clarified that a <subdialog> without a <return>continues until it encounters a <exit> or until no formitems remain eligible for the FIA to select (equivalent to an<exit>) (2.3.4)
Clarified that a standalone query string is not a valid URI:no special handling of them is therefore required in transitionalURIs specified in <subdialog> and <goto> (2.3.4,5.3.7,6.1.1)
If the namelist attribute in <subdialog>,<submit>, <clear>, <exit> or <return>references an undeclared variable, then an error.semantic eventis thrown (2.3.4,5.3.3,5.3.8,5.3.9,5.3.10 )
In a <subdialog>, parameters must be declared as<var> elements in the form executed as the subdialog or anerror.semantic will be thrown. (2.3.4,6.4)
Clarified that an <object> itself is responsible fordetermining whether a parameter name or value it receives isinvalid. If so, an error is thrown (it may be a standard error oran object-specific error) (2.3.5)
user hangup during recording terminates recording normally;data recorded prior to hangup can be returned to server (2.3.6).
interpretation of grammars in <record> (2.3.6).
field variable in <record> is a reference to recordedaudio. When submitting recorded data to a server, the "enctype"of <submit> should be set to "multipart/form-data" (2.3.6,5.3.8).
Clarification of behavior when <record> dtmftermattribute set to false and DTMF input received (2.3.6)
Clarification of when recording starts and behavior whenrecording terminated before any audio data is collected (2.3.6)
Clarified that how the <record> variable is implementedmay differ between platforms (although all platforms must supportits specified behavior in <audio> and <submit> (2.3.6)
Clarified that the finalsilence and maxtime attributes of<record> default to platform-specific values (2.3.6)
During execution of the <record> element, if no audiois collected before the user terminates recording with DTMF inputmatching a local DTMF grammar (or when the dtmfterm attribute isset to true), then the record variable is not filled (so shadowvariables are not set), and the FIA applies as normal without anoinput event being thrown. However, information about the inputmay be available in these situations via application.lastresult$as described in Section 5.1.5. (2.3.6)
In the <record> element, the dtmfterm attribute haspriority over specified local DTMF grammars (2.3.6)
In <record>, no audio may be collected if DTMF orspeech input is received during prompt playback orbeforethe timeout intervalexpires (2.3.6)
termination of <transfer> by speech or DTMF returnsnear_end_disconnect status (2.3.7).
value of "dest" attribute on <transfer> (2.3.7).
the form item variable of <transfer> is undefined forblind transfer (2.3.7).
Revision of <transfer> including: error events whenplatform unable to handle 'dest'/'destexpr' URI; clarificationthat platform is disconnected immediately when blind transfertakes place; specification of events to be thrown if platformcannot perform blind or bridged transfer; clarification that theconnection status is not available for blind transfer (althoughsome error conditions may be reported) ; transferaudio is ignoredfor blind transfer; clarification of audio playback before andduring bridged transfer, including situation where transferaudioends before connection established and that queued audio isflushed before starting transfer; clarification of timings forlistening for input and playback of audio; added name$.inputmodeand name$.utterance shadow variables; clarified that platformsupport for listening for input during transfer is optional; (2.3.7,5.2.6)
The bargeintype on bridged <transfer> is fixed to"hotword" for the duration of the outgoing call (2.3.7)
Platforms supporting either blind or bridged <transfer>may support bargein input modes of DTMF, speech, or both, duringthe call transfer to drop the far-end. In both blind and bridgedtransfer, if the transfer was not terminated by a grammar match,the shadow variablename$.inputmode is undefined. Blindtransfer attempts can only be cancelled up to the point theoutgoing call begins. In blind transfer, the format ofplatform-specific error conditions should follow the namingconvention of other transfer form item variable values. Thecaller can cancel a blind transfer attempt before the outgoingcall begins by barging in with a speech or DTMF command thatmatches an active grammar during the playback of any queuedaudio; in this case the form item variable is set, its shadowvariables are set as is application.lastresult$. If the callerdisconnects by hanging up during a blind transfer attempt beforethe connection to the callee begins, aconnection.disconnect.hangup event will be thrown, and dialogexecution will transition to a handler for the hangup event. Theform item variable, and thus shadow variables, will not be set.If the caller cancels the blind transfer attempt via a DTMF orvoice command before the outgoing call begins (during playback ofqueued audio), the form item variable is set to'near_end_disconnect'. In bridged transfer, the caller can cancelthe transfer attempt before the outgoing call begins by bargingin with a speech or DTMF command that matches an active grammarduring the playback of any queued audio. (2.3.7)
Clarified that <transfer> variable and shadow variablesare not set if caller hangs up during call transfer or a calltransfer attempt. If a call is terminated by the caller with avoice or DTMF command prior to the call being answered, theduration shadow variable is set to 0 (2.3.7.2.2)
Clarified that <transfer> utterance shadow variable isset to the DTMF result if the transfer was terminated by DTMFinput (2.3.7.2.2)
Addressed situation in a bridged transfer where caller forcescallee to disconnect via DTMF or voice command before theconnection is established: (2.3.7.2.2)
In <transfer>, the shadow variable name$.inputmode isundefined if the transfer was not terminated by a grammar match.(2.3.7.2.2)
Upon encountering a document containing a <filled>element specifying either a 'mode' or 'namelist' attribute as achild of an input item, then an error.badfetch is thrown by theplatform. In addition, an error.badfetch is thrown when thedocument contains a <filled> element with a namelistattribute referencing a control item variable (2.4)
<link>s have zero or more grammars (2.5).
events throw by a <link> are handled by the bestqualified <catch> in active scope (2.5).
<link> can be a child of the form items <field>and <initial> only (2.5)
Clarified that a "scope" attribute on the element containinga <link> element has no effect on the scope of the<link>s grammars (2.5)
Clarified that in <link>, any URIs in its content (e.g.<grammar>s) are evaluated/resolved where the <link>is defined, while any URIs and ECMAScript expressions in itsattributes are evaluated/resolved in the active dialog scope andcontext (2.5)
In a <link>, grammars are not allowed to specify scopeas described in3.1.3 (2.5)
If execution is in a modal form item, then link grammars atthe application level are not active (2.5)
"xml:lang" attribute in <grammar> does not requiremulti-lingual support from platform (3.1)
unsupported grammar language results inerror.unsupported.language event being thrown (3.1.1)
unsupported language can be indicated in the message variableof a <throw> (3.1.1).
'number' results as a string which ECMAScript willautomatically convert to a number in a numerical expression;string must not use a leading zero (3.1.1)
Clarified that the SRGS <grammar> element is extendedin VoiceXML 2.0 to allow PCDATA for inline grammar formatsbesides the XML format of SRGS (3.1.1,3.1.1.4 )
implicit grammars (such as options) do not support weights(3.1.1.3).
The type attribute in <grammar> takes precedence overother possible source of media type; if specified, and inconflict with the type of the grammar, then an error is thrown(3.1.1.2,3.1.1.4)
Clarified the use and interpretation of <grammar>attributes inherited from SRGS (version, xml:lang, mode, root,tag-format, xml:base). Inline XML SRGS grammars follow thebehavior specified in SRGS. For inline ABNF SRGS grammars as wellas external ABNF and XML SRGS grammars the platform must ignorethese attributes. For all other grammar types, the use andinterpretation of these attributes is platform-dependent (3.1.1.4)
Clarified that the root rule in SRGS grammars does not haveto be a public rule (3.1.1.4)
The forms of rule reference defined by SRGS that are notsupported in VoiceXML 2.0 only apply when referencing externalgrammars using 'src' attribute (3.1.1.4)
Clarified the distinction between form-level and field-levelgrammars (3.1.6,3.1.6.1,3.1.6.2 )
"slot" can select properties at arbitrary levels of nestingusing dot-separated list; removed text suggesting that arrayindexing expressions (e.g. "pizza.toppings[3]") are supported (3.1.6.1)
Clarified that matching form-level grammars can overrideexisting values in input items and that <filled> processingof these items takes place as describedSection 2.4 andAppendix C (3.1.6.1)
Aligned description of DTMF grammar with that of speechgrammars: DTMF grammars can return a set of attribute-value pairsas well as a string value (3.1.2)
If a document contains a grammar specifying a scope and thatgrammar is contained a field, in a <link> or in a menu<choice>, then error.badfetch is thrown (3.1.3)
If no grammars are active when input is expected in a<form> or <menu>, an error.semantic event is thrown(3.1.4)
The inputmodes property does not affect grammar activation(3.1.4,6.3.6)
If the input matches more than one active grammar with thesame precedence, then the first grammar in document order hashighest priority. (3.1.4)
ongoing work on semantic attachments within <grammar>(3.1.5)
Input item variables can be set by semantic results fromother input items (3.1.6)
The default values for a <prompt>'s "bargein" and"bargeintype" attributes are determined by the "bargein" and"bargeintype" properties (4.1)
Clarified that a time designator is a non-negative numberwhich must be followed by ms or s. Clarified that the followingattributes take time designators as their value: <prompt> -timeout; <transfer> - maxtime (NB: default is now "0s"),connecttimeout; <record> - maxtime, finalsilence. Clarifiedthat the following properties have time designator values:fetchtimeout, completetimeout, incompletetimeout,maxspeechtimeout, interdigittimeout, termtimeout, timeout,fetchaudiodelay, fetchaudiominimum, fetchtimeout (4.1,2.3.6,2.3.7,6.1.1,6.3)
"xml:lang" attribute in <prompt> does not requiremulti-lingual support from platform (4.1.1)
unsupported synthesis language results inerror.unsupported.language event being thrown (4.1.1).
enclosing <prompt> required if text contains speechsynthesis markup (4.1.2).
When prompt content is specified without an explicit<prompt> element then the prompt attributes are defined asspecified in the table in Section 4.1 (4.1.2)
'alternate content' in <audio> (4.1.3).
When <audio> 'expr' evaluates to ECMAScript undefined,the content of the element ignored. If it evaluates to an invalidURI, or the format is unsupported, etc, then the fallbackstrategy is invoked (4.1.3)
It is a platform optimization to stream audio in<audio> (4.1.3)
Clarified that the expr attribute of <audio> is anECMAScript expression which references a previously<record>ed audio, or evaluates to the URI of an audioresource to fetch (4.1.3)
stand-alone <value> element is legal outside<prompt> (4.1.4).
Simplified evaluation of expr in <audio> so that it isnot treated specially: it is CDATA where XML special charactersdo not need to be escaped. It is not treated as an SSML document,or a document fragment. (4.1.4)
buffered DTMF input is deleted when "bargein" is false on<prompt> (4.1.5).
Clarified behavior when bargein occurs during a sequence ofprompts (4.1.5)
Clarified that when a prompt's "bargein" attribute is false,no input is buffered while the prompt is playing (any DTMFalready buffered is discarded) (4.1.5)
The "bargeintype" attribute of <prompt> applies to DTMFinput as well as speech input (4.1.5.1)
When the bargeintype is speech, the prompt is stoppedirrespective of which grammars are active (4.1.5.1)
With the hotword bargeintype, input not matching a grammar isignored even during the timeout period (4.1.5.1)
Prompt counter are also maintained for <initial> itemsin a form (4.1.6)
In prompt selection, whenever the system selects a giveninput item in the select phase of FIA and the FIA performs normalselection and queuing of prompts, the input item's associatedprompt counter is incremented (4.1.6)
Clarified that each <prompt> has its own timeout valueand that a <prompt>'s timeout attribute defaults to thetimeout property when the prompt is queued (4.1.7)
relationship between prompt queueing and input collection (4.1.8).
Asychronous events unrelated to transition execution (e.g.disconnect) are buffered until waiting state before being thrown(4.1.8)
Clarified mapping between interpreter states and FIA; andthat activation of grammars and waiting for input occursimultaneously with playback of prompts (4.1.8)
Clarified that when a prompt bargein attribute is false,input is not collected and DTMF buffered in the transition stateis deleted as described in 4.1.5 (4.1.8)
Platforms may differ in whether or not they discard bufferednon-matching DTMF when an ASR grammar matches input (4.1.8)
VoiceXML and ECMAScript variables are part of the samevariable space;variables declared in ECMAScript can be useddirectly in VoiceXML (5.1).
VoiceXML variable names, including field names, must followECMAScript naming rules; variable name declarations cannotcontain a dot; the field name "a.b" is illegal (5.1).
VoiceXML variables and variable scoping follows ECMAScriptscope chains; as a consequence, references to undeclaredECMAScript variables result in error.semantic being thrown (5.1.1,5.1.2)
scope of variables (5.1.2).
Clarified application and document scoping of variables inapplication root documents (5.1.2)
Dialog scope contains form item variables, not the variablesdefined within each form item (5.1.2)
only some cond operators need to be escaped (5.1.3).
Clarified that a document with a variable x but without aspecified application root, then the variable can be referencedas application.x and document.x (5.1.3)
Clarified that "application.lastresult$" is an ECMAScriptarray (5.1.5)
Clarification on persistence of lastresult applicationvariable. (5.1.5)
Interpretations in lastresult are ordered first byconfidence, and then by scope precedence of grammars (5.1.5,2.3.1,3.1.4)
When a DTMF grammar is matched, the interpretation variableof application.lastresult contains the matched digit string. (5.1.5)
After a <nomatch>, application.lastresult$ is be setbut the values are platform-dependent (5.1.5)
evaluation of relative URIs against active document (5.2).
Clarified that catch elements use the innermost property ofthe element where the event originated, not where the catch isdefined (5.2)
VoiceXML does not generally specify when events are thrown(5.2.1).
Event counters associated with <catch>s are incrementedwhen an event occurs with same full or prefix matching name; thisaffects selection of the catch handler with the'correct count' in Section 5.2.4 (5.2.2)
definition of "event" and "count" attributes of <catch>(5.2.2).
no inherent limitations on <catch> in, for example,case of user hangup (5.2.2).
<catch>'s event may be the string "." indicating thatall events are to be caught (5.2.2).
<catch> without a specified event attribute isequivalent to one with event="." (5.2.2,5.2.4)
Clarified when form event counters are incremented and reset(5.2.2)
<catch> applies to form items except for blocks (5.2.2)
"as if by copy" catch inheritance (5.2,5.2.4).
catch element selection algorithm (5.2.4).
defined prefix match as token match rather than string match(5.2.4).
In the <catch> element, an event attribute with anempty string value is syntactically invalid. To catch all events,the event attribute can be omitted, or specified as "." to prefixmatch all events (5.2.4)
"error.badfetch" pre-defined error type (5.2.6)
error.badfetch is thrown until the document is ready forexecution; whether variable initiation is part of execution isplatform-dependent (5.2.6)
Clarification of situations in which 'error.badfetch' isthrown. A conforming browser may also throw events whose nameextends pre-defined events (5.2.6)
Application-specific and platform-specific event types shoulduse the reversed Internet domain name convention to avoid namingconflicts (5.2.6)
HTTPS is not the same protocol as HTTP (5.2.6)
Errors raised in the first document in a session, and errorsraised before entering the FIA in subsequent loaded documents arehandled in a platform-specific manner (5.2.6)
Removed 'divide by 0' as a run-time error which results inerror.semantic being thrown (ECMAScript does not report an error)(5.2.6)
Clarified that the event error.noauthorization is thrown inmore circumstances than just connection authorization failures(5.2.6)
Clarified that the event error.unsupport.element isonly thrown for VoiceXML 2.0 elements (5.2.6)
The name attribute of a <var> element specifies avariable without a scope prefix. If it specifies a variable witha scope prefix, then an error.semantic event is thrown (5.3.1)
Clarified that an error.semantic event is thrown if anattempt is made to assign to an undeclared variable. Propertiesof ECMAScript objects, e.g. o.foo, can be assigned directly;attempting to declare them results in an error.semantic eventbeing thrown (5.3.2)
The name attribute of an <assign> element mustreference a variable which has been previously declared otherwisean error.semantic event is thrown. By default, the scope in whichthe variable is resolved is the closest enclosing scope of thecurrently active element. To remove ambiguity, the variable namemay be prefixed with a scope name (5.3.2)
The namelist of <clear> may specify variables otherthan form item variables which are to be reset (5.3.3)
The variable references in <clear>'s namelist areresolved relative to the current scope according to Section 5.1.3(5.3.3)
effect of <reprompt> in catch elements (5.3.6)
behaviour of <reprompt> when in <catch> withfinal <goto> (5.3.6)
FIA performs normal prompt queueing after the execution ofcatch elements when they end with a <submit> or<return> as well as <goto> (5.3.6,AppendixC)
Clarified that a <reprompt> element has no effectoutside of a catch (5.3.6)
effect of URI in <goto> on document variables (5.3.7)
Clarified for <goto> that errors which occur duringform item transition, the event is handled in the dialog scope(5.3.7)
When nextitem or expritem in <goto> reference annon-existent form item, an error.badfetch event is thrown (5.3.7)
variables declared in VoiceXML or ECMAScript can be submitted(5.3.8).
Clarified some of the circumstances in which <submit>can be satisfied by intermediate caches (5.3.8)
In the <submit> element, the enctype attribute is onlyrelevant for when the method attribute is set to "post" (5.3.8)
<exit> does not throw an "exit" event (5.3.9)
The value of the 'expr' attribute of <exit> is anECMAScript expression (5.3.9)
Executing <disconnect/> causes the interpreter to (a)enter the final processing state and (b) flush the prompt queue(5.3.11)
no "type" attribute of <script> (5.3.12).
<script> evaluated along with <var> elements andform item variables in <form> (5.3.12)
definition of "charset" in <script> (5.3.12)
Handling of <log> is platform-dependent (5.3.13)
The label and expr attributes of the <log> element areoptional. (5.3.13)
revised prefetch (6.1)
effect of "fetchhint" attribute (6.1.1)
caching policy selection (6.1.2)
caching follows HTTP 1.1 cache correctness rules (6.1.2)
Clarified that there is no markup mechanism to control thecaching of application root documents (6.1.2.1)
Clarified that first type of <meta> is expressed by theattributes name and content, and the second type by http-equivand content (6.2.1)
When different values for a <property> are specified atthe same level, the last one in document order applies (6.3)
Properties can be set in field input items but not controlitems (6.3)
Clarified that if a platform detects a property has illegalvalue, then it should throw an error.semantic (6.3)
format of platform-specific properties (6.3.1)
definitions of "completetimeout" and "incompletetimeout"speech recognizer properties (6.3.2)
Universal commands grammars are always active except in thecase of modal input items (6.3.6)
Parameter values passed to <subdialog>s are always data(6.4)
definition of time designation values (6.5)
Clarified that the number format is that used in CSS2, andthat the value of the ASR properties confidencelevel,sensitivity, and speedvsaccuracy are in this format (6.5,6.3.2)
Restricted field names, name attribute in <var> andnextitem attribute in <goto> to NMTOKEN; extended nameattribute in <assign> to be like NMTOKEN but also allow '$'(for shadow variable assignments); restricted namelist attributein <filled> to NMTOKENS; extended namelist attribute in<exit>, <submit>, <clear>, and <return>to be like NMTOKENS but allow '$' (for shadow variablesubmissions) (Appendix B,Appendix O)
Restricted the content model of <choice> to PCDATA and<grammar> elements; clarified that <enumerate> cannotoccur inside another <enumerate> (Appendix B,Appendix O,2.2.4)
Clarified that the DTD (unlike the schema) cannot correctlyexpress that the <metadata> element can contain elementsfrom other XML namespaces (Appendix B)
The DTD specifies the xmlns attribute of <vxml> asFIXED and has the default value 'http://www.w3.org/2001/vxml' (Appendix B)
Aligned DTD and schema with text so that accept attribute on<choice> does not default to 'exact' if unspecified (Appendix B,Appendix O)
FIA clarified that application.lastresult$ assignment happensafter every successful recognition (Appendix C)
FIA clarified for matching <link> grammars inside thecurrent form or menu and for matching menu <choice>grammars outside the current form or menu (Appendix C)
FIA corrected that collection of active grammars does notinclude grammars from elements in the <subdialog>'s callchain (Appendix C)
FIA's Initialization Phase clarified for initialization of<script> elements and form items (Appendix C)
Clarified that events may be generated at several pointsduring FIA execution, and that how they are handled depends uponwhich FIA phase is active (Appendix C)
Clarified that in the FIA collect phase, only prompts frominput items and <initial> are selected and their promptcounter incremented. The queueing of prompts in a <block>takes place when the form item is executed (Appendix C)
In the process phase of the FIA, <filled> actions arenot only triggered by utterance input - for example, they canalso be triggered when maxtime is reached during a <record>execution (Appendix C)
Clarified the use of various timeouts for DTMF input (Appendix D)
If a Conforming processor cannot process a non-standardVoiceXML element or attribute, then it must throw anerror.badfetch error (Appendix F)
explanatory notes on portable use of builtins and expectedplatform dependence (Appendix P)
parameterization of builtin DTMF and speech grammars (Appendix P).
handling of contradictory parameters to digits builtin (Appendix P).
result value returned from "number" builtin type (Appendix P).
Currency code not specified if not spoken (Appendix P).
Only the digit and boolean grammars can be parameterized (Appendix P)
Description of rendering builtin values using <say-as>(Appendix P)
Speech or DTMF <grammar>s in a <field> with aspecified builtin type are in addition to the builtin grammars;they do not override them (Appendix P)

Miscellaneous

Updated examples with XML encoding attribute, recommendedschema attributes, and escaped illegal XML characters (<,>, &, etc...)
Using tentative media types (e.g. "application/srgs+xml")submitted to IETF for approval
Added section describing the origins of VoiceXML and how itrelates to other work in the area (1)
Specified set of required audio formats for <audio> and<record> (1.2.4).
capabilities of conforming VoiceXML platform with respect tospeech and dtmf grammars, audio, TTS, record, and transfersupport (1.2.5).
platforms should identify themselves with User-Agent HTTPheader (1.2.5).
Builtin types and fundamental grammars are informative notnormative (2.3.1,2.3.1.1,2.3.1.2,Appendix P )
Updated section to match SRGS 1.0 specification (3)
description of how semantic interpretations are mapped toform variables (3.1.6).
Updated section to match SSML 1.0 specification (4)
reserved the variable namespace "_$" for internal use (5.1).
use of variables and form items with names "session","application", "document", and "dialog" not recommended (5.1.2).
Added Recommendation that metadata information is expressedin <metadata> rather than <meta>; removed recommendedmetadata information using <meta>; added recommendedmetadata information using RDF schema and Dublin Core properties(6.2)
Changed conformance behavior when interpreter encountersproperties it cannot process: it must (rather than should) notthrown an error.unsupported.property and must (rather thanshould) ignore the property (6.3.1)
DTD is now Informative rather than Normative (Appendix B)
set of required audio formats (Appendix E).
Replaced "audio/wav" with "audio/x-wav" in examples, andadded note that the media type "audio/wav" will be adopted whenofficially registered with the IETF (Appendix E)
revised definition of Conforming VoiceXML Processor,including requirement to support syntax and semantics of allelements described in this document (Appendix F).
Conforming document section references schema rather DTDconstraints (AppendixF)
Conformance statement reflect that DTD is informative butschema is normative. A conforming document must specify thevoiceXML namespace on the root element. The version="2.0"attribute must also be present. It is recommended to provide"xsi:schemaLocation" to indicate the location of the VoiceXMLschema. The DOCTYPE declaration is optional. The behavior of aVoiceXML processor is undefined when encountering documents withnon-VoiceXML designated root elements (Appendix F)
Revised description of how VoiceXML can address accessibilityrequirements and issues (Appendix H)
reusability appendix (Appendix K).
Added references appendix (Appendix M)
Added appendix describing VoiceXML media type and file suffixincluding link to IETF memo for registration of VoiceXML Mediatype (Appendix N)
Definition of normative schema for VoiceXML. This usesvarious other schema to adapt definitions from core schemas inthe grammar and synthesis specifications (Appendix O)
Verified schema with XML Spy 4.4, XSV (June 2002 version) andXerces 2 (Java and C++ versions) (Appendix O)
Added link to complete set of schema required for VoiceXML2.0 (Appendix O)

Appendix K —Reusability

K.1 Reusable dialog components

Definition: A packaged application fragment designed tobe invoked by arbitrary applications or other Reusable DialogComponents. A Reusable Dialog Component (RDC) encapsulates thecode for an interaction with the caller.

Reusable dialog components provide pre-packaged functionality"out-of-the-box" that enables developers to quickly buildapplications by providing standard default settings and behavior.They shield developers from having to worry about many of theintricacies associated with building a robust speech dialog,e.g., confidence score interpretation, error recovery mechanisms,prompting, etc. This behavior can be customized by a developer ifnecessary to provide application-specific prompts, vocabulary,retry settings, etc.

In this version of VoiceXML, the only authentic reusablecomponent calling mechanisms are <subdialog> and<object>. Components called this way follow a model similarto subroutines in programming languages: the component isconfigured by a well-defined set of parameters passed to thecomponent, the component has a relatively constrained interactionwith the calling application, the component returns awell-defined result, and control returns automatically to thepoint from which the component was called. This has all thesignificant advantages of modularity, reentrancy, and easy reuseprovided by subroutines. Of the two kinds of components, only<subdialog> components are guaranteed to be as portable asVoiceXML itself. On the other hand, <object> components maybe able to package advanced, reusable functionality that has notyet been introduced into the standard.

K.2 Templates and samples

Although reusable dialog components have the advantages ofmodularity, reentrancy, and easy reuse as described above, thedisadvantage of such components is that they must be designedvery carefully with an eye to reuse, and even with the mostcareful of designs it is possible that the application developerwill encounter situations for which the component cannot beeasily configured to handle the application requirements. Inaddition, while the constrained interaction of a component withits calling environment makes it possible for the componentdesigner to create a component that works predictably indisparate environments, it also may make the user's interactionwith the component seem disconnected from the rest of theapplication.

In such situations the application developer may wish to reuseVoiceXML source code in the form of samples and templates -samples designed for easy customizability. Such code is moreeasily tailored for and integrated into a particular application,at the expense of modularity and rentrancy.

Such templates and samples can be created by separatinginteresting VoiceXML code from a main dialog and thendistributing that code by copy for use in other dialogs. Thisform of reusability allows the user of the copied VoiceXML codeto modify it as necessary and continue to use their modifiedversion indefinitely.

VoiceXML facilitates this form of reusability by preservingthe separation of state between form elements. In this regard,VoiceXML and[HTML] aresimilar. An HTML table can be copied from one HTML page toanother because the table can be displayed regardless of thecontext before or after the table element.

Although parameterizability, modularity, and maintainabilitymay be sacrificed with this approach, it has the advantage ofbeing simple, quick, and eminently customizable.

Appendix L—Acknowledgements

This W3C specification is based upon VoiceXML 1.0 submitted bythe VoiceXML Forum in May 2000. The VoiceXML Forum authors were:Linda Boyer, IBM; Peter Danielsen, Lucent Technologies; JimFerrans, Motorola; Gerald Karam, AT&T; David Ladd, Motorola;Bruce Lucas, IBM; Kenneth Rehor, Lucent Technologies.

This version was written by the participants in the W3C VoiceBrowser Working Group. The following have significantlycontributed to writing this specification:

Paolo Baggia, Loquendo
Daniel C. Burnett, Nuance Communications
Emily Candell, Comverse
Jerry Carter, Invited Expert
Deborah Dahl, Invited Expert
Peter Danielsen, Lucent (until October 2002)
Martin Dragomirecky, Cisco
Jim Ferrans, Motorola
Andrew Hunt, ScanSoft
Gerald Karam, AT&T
Dave Ladd, Dynamicsoft
Paul Lamere, Sun Microsystems
Bruce Lucas, IBM
Scott McGlashan, HP
Mitsuru Oshima, General Magic
Brad Porter, Tellme
Gavriel Raanan, NMS Communications
Ken Rehor, Vocalocity
Steph Tryphonas, Tellme

The Working Group would like to thank Dave Raggett and JimLarson for their invaluable management support.

Appendix M —References

M.1. NormativeReferences

[CSS2]: "Cascading Style Sheets, level 2, CSS2Specification ", Bos et al. W3C Recommendation, May1998
See http://www.w3.org/TR/REC-CSS2/
[ECMASCRIPT]: "Standard ECMA-262 ECMAScript LanguageSpecification ", Standard ECMA-262, December 1999.Seehttp://www.ecma-international.org/publications/standards/Ecma-262.htm
[RFC1521]: "MIME (Multipurpose Internet Mail Extensions) PartOne: Mechanisms for Specifying and Describing the Format ofInternet Message Bodies ", IETF RFC 1521, 1993See http://www.ietf.org/rfc/rfc1521.txt
[RFC2396]: "Uniform Resource Identifiers (URI): GenericSyntax ", IETF RFC 2396, 1998.
See http://www.ietf.org/rfc/rfc2396.txt
[RFC2616]: "Hypertext Transfer Protocol -- HTTP/1.1", IETF RFC 2616, 1999.
See http://www.ietf.org/rfc/rfc2616.txt
[RFC2806]: "URLs for Telephone Calls ", IETF RFC2806, 2000.
See http://www.ietf.org/rfc/rfc2806.txt
[RFC3066]: "Tags for the Identification of Languages", IETF RFC 3066, 2001.
Note that[XML] adoptedRFC3066 through anerrata as of 2001-02-22. RFC3066 obsoletes[RFC1766] .See http://www.ietf.org/rfc/rfc3066.txt
[SCHEMA1]: "XML Schema Part 1: Structures ". Thompsonet al. W3C Recommendation, May 2001.
See http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/
[SRGS]: "Speech Recognition Grammar Specification Version1.0 ". Hunt and McGlashan. W3C Proposed Recommendation,December 2003.
See http://www.w3.org/TR/2003/PR-speech-grammar-20031218/
[SSML]: "Speech Synthesis Markup Language Version1.0 ". Burnett, Walker and Hunt. W3C CandidateRecommendation, December 2003.
See http://www.w3.org/TR/2003/CR-speech-synthesis-20031218/
[UNICODE]: "The Unicode Standard ". The UnicodeConsortium.
See http://www.unicode.org/unicode/standard/standard.html
[XML]: "Extensible Markup Language (XML) 1.0 ".Bray et al. W3C Recommendation.
See http://www.w3.org/TR/2000/REC-xml-20001006
[XML-BASE]: "XML Base ", J. Marsh, editor, W3CRecommendation, June 2001.
See http://www.w3.org/TR/2001/REC-xmlbase-20010627/.
[XMLNAMES]: "Namespaces in XML ". Bray et al. W3CRecommendation, January 1999.
See http://www.w3.org/TR/1999/REC-xml-names-19990114/

M.2. InformativeReferences

[ATAGIO]: "Authoring Tool Accessibility Guidelines1.0 ", Treviranus et al. W3C Recommendation, February2000.
See http://www.w3.org/TR/2000/REC-ATAG10-20000203/
[DC]: "Dublin CoreMetadata Initiative ", a Simple Content DescriptionModel for Electronic Resources.
See http://dublincore.org/
[HTML]: "HTML 4.01 Specification ", Dave Raggettet al. W3C Recommendation, December 1999.
See http://www.w3.org/TR/1999/REC-html401-19991224/
[IANA]: "IANA Character Sets ", IANA.See http://www.iana.org/assignments/character-sets
[ISO4217]: "ISO4217:2001 Codes for the representation of currencies andfunds ", ISO, 2001
See http://www.iso.ch/
[JSAPI]: "Java Speech API ", Sun Microsystems,Inc.
Seehttp://java.sun.com/products/java-media/speech/index.jsp
[JSGF]: "JSpeech Grammar Format ", Andrew Hunt,W3C Note, June 2000.
See http://www.w3.org/TR/2000/NOTE-jsgf-20000605/
[NLSML]: "Natural Language Semantics Markup Language for theSpeech Interface Framework ", Deborah A. Dahl. W3CWorking Draft, November 2000
See http://www.w3.org/TR/2000/WD-nl-spec-20001120/
[RDF-SYNTAX]: "Resource Description Framework (RDF) Model andSyntax Specification ", Ora Lassila and Ralph R.Swick. W3C Recommendation, February 1999.
See http://www.w3.org/TR/REC-rdf-syntax/
[RDF-SCHEMA]: "Resource Description Framework (RDF) SchemaSpecification 1.0 ", Dan Brickley and R.V. Guha. W3CCandidate Recommendation, March 2000.
See http://www.w3.org/TR/2000/CR-rdf-schema-20000327/
[RFC1766]: "Tags for the Identification of Languages", IETF RFC 1766, 1995
Note that[XML] adoptedRFC3066 through anerrata as of 2001-02-22.[RFC3066] obsoletes RFC1766.See http://www.ietf.org/rfc/rfc1766.txt
[RFC2119]: "Key words for use in RFCs to Indicate RequirementLevels ", IETF RFC 2119, 1997.
See http://www.ietf.org/rfc/rfc2119.txt
[RFC2361]: "WAVE and AVI Codec Registries ", IETF RFC2361, 1998.
See http://www.ietf.org/rfc/rfc2361.txt
[SISR]: "Semantic Interpretation for SpeechRecognition ", Luc Van Tichelen. W3C Working Draft,April 2003.
Seehttp://www.w3.org/TR/2003/WD-semantic-interpretation-20030401/
[UAAGIO]: "User Agent Accessibility Guidelines 1.0", Jacobs et al. W3C Proposed Recommendation, October 2002.See http://www.w3.org/TR/2002/PR-UAAG10-20021016/
[VOICEXML-1.0]: "Voice eXtensible Markup Language 1.0 ",Boyer et al, W3C Note, May 2000.
See http://www.w3.org/TR/2000/NOTE-voicexml-20000505/
[WAI-WEBCONTENT]: "Web Content Accessibility Guidelines 1.0", Chisholm et al. W3C Recommendation, May 1999See http://www.w3.org/TR/WAI-WEBCONTENT/
[XAG]: "XML Accessibility Guidelines ",Dardailler et al. W3C Working Draft, October 2002See http://www.w3.org/TR/xag.html

Appendix N —Media Type and FileSuffix

The W3C Voice Browser Working Group has applied to IETF toregister a media type for VoiceXML. The requested media type isapplication/voicexml+xml.

The W3C Voice Browser Working Group has adopted the conventionof using the ".vxml" filename suffix for VoiceXML documents.

Appendix O —VoiceXML XML Schema Definition

This section is Normative.

The XML Schema definition for VoiceXML is located athttp://www.w3.org/TR/voicexml20/vxml.xsd.

The VoiceXML schema depends upon other schemas defined in theVoiceXML namespace:

vxml-datatypes.xsd : definition of datatypesused in the VoiceXML schema
vxml-attribs.xsd : definition of attributes andattribute groups used in the VoiceXML schema
vxml-grammar-restriction.xsd: this schemareferences the no-namespace schema of the Speech RecognitionGrammar Specification 1.0[SRGS] and restricts some of its definitions forembedding in the VoiceXML namespace.
vxml-grammar-extension.xsd: this schemareferences vxml-grammar-restriction.xsd and extends some of itsdefinitions for VoiceXML.
vxml-synthesis-restriction.xsd: this schemareferences the no-namespace schema of the Speech Synthesis MarkupLanguage 1.0[SSML] andextends as well as restricts some of its definitions forembedding in the VoiceXML namespace.
vxml-synthesis-extension.xsd: this schemareferences vxml-synthesis-restriction.xsd and extends some of itsdefinitions for VoiceXML.

The complete set of Speech Interface Framework schema requiredfor VoiceXML 2.0 is availablehere.

Appendix P — BuiltinGrammar Types

The <field> type attribute inSection 2.3.1 is used to specify a builtingrammar for one of the fundamental types. Platform support forfundamental builtin grammars is optional. If a platform doessupport builtin types, then it must follow the description givenin this appendix as closely as possible, including all thebuiltins for a given language.

Each builtin type has a convention for the format of the valuereturned. These are independent of language and of theimplementation. The return type for builtin fields is a stringexcept for the boolean field type. To access the actualrecognition result, the author can reference the <field>shadow variablename$.utterance. Alternatively, thedeveloper can access application.lastresult$, whereapplication.lastresult$.interpretation has the same string valueas application.lastresult$.utterance.

The builtin types are defined in such a way that a VoiceXMLapplication developer can assume some consistency of user inputacross implementations. This permits help messages and otherprompts to be independent of platform in many instances. Forexample, the boolean type's grammar should minimally allow"yes" and "no" responses in English, but each implementation isfree to add other choices, such as "yeah" and "nope".

In cases where an application requires specific behavior ordifferent behavior than defined for a builtin, it should use anexplicit field grammar. The following are circumstances in whichan application must provide an explicit field grammar in order toensure portability of the application with a consistent userinterface:

A platform is not required to implement a grammar that acceptsall possible values that might be returned by a builtin. Forinstance, the currency builtin defines the return valueformatting for a very broad range of currencies ([ISO4217]). The platform isnot required to support spoken input that includes any of theworld's currencies since that can negatively impact recognitionaccuracy. Similarly, the number builtin can return positive ornegative floating point numbers but the grammar is not requiredto support all possible spoken floating point numbers.
Builtins are also limited in their ability to handleunderspecified spoken input. For instance, "20 peso" cannot beresolved to a specific[ISO4217] currency code because the "peso" isthe name of the currency of numerous nations. In such cases theplatform may return a specific currency code according to thelanguage or may omit the currency code.

All builtin types must support both voice and DTMF entry.

The set of accepted spoken input for each builtin type isplatform dependent and will vary by language.

The value returned by a builtin type can be read out using the<say-as> element. VoiceXML extends <say-as> in[SSML] by adding'interpret-as' values corresponding to each builtin type.These values take the form "vxml:<type>" wheretype is a builtin type. The precise rendering of builtintypes is platform-specific and will vary by language.

The builtin types are:

Table 66: Builtin Types
boolean	Inputs include affirmative andnegative phrases appropriate to the current language. DTMF 1 isaffirmative and 2 is negative. The result is ECMAScript true foraffirmative or false for negative. The value will be submitted asthe string "true" or the string "false". If the field value issubsequently used in <say-as> with the interpret-as value"vxml:boolean", it will be spoken as an affirmative or negativephrase appropriate to the current language.
date	Valid spoken inputs include phrasesthat specify a date, including a month day and year. DTMF inputsare: four digits for the year, followed by two digits for themonth, and two digits for the day. The result is a fixed-lengthdate string with format yyyymmdd, e.g. "20000704". If the year isnot specified, yyyy is returned as "????"; if the month is notspecified mm is returned as "??"; and if the day is not specifieddd is returned as "??". If the value is subsequently used in<say-as> with the interpret-as value "vxml:date", it willbe spoken as date phrase appropriate to the current language.
digits	Valid spoken or DTMF inputs includeone or more digits, 0 through 9. The result is a string ofdigits. If the result is subsequently used in <say-as> withthe interpret-as value "vxml:digits", it will be spoken as asequence of digits appropriate to the current language. A usercan say for example "two one two seven", but not "twenty onehundred and twenty-seven". A platform may support constructssuch as "two double-five eight".
currency	Valid spoken inputs include phrasesthat specify a currency amount. For DTMF input, the "*" key willact as the decimal point. The result is a string with the formatUUUmm.nn, where UUU is the three character currency indicatoraccording to ISO standard 4217[ISO4217], or mm.nn if the currency is notspoken by the user or if the currency cannot be reliablydetermined (e.g. "dollar" and "peso" are ambiguous). If the fieldis subsequently used in <say-as> with theinterpret-as value "vxml:currency", it will be spoken as acurrency amount appropriate to the current language.
number	Valid spoken inputs include phrasesthat specify numbers, such as "one hundred twenty-three", or"five point three". Valid DTMF input includes positive numbersentered using digits and "*" to represent a decimal point. Theresult is a string of digits from 0 to 9 and may optionallyinclude a decimal point (".") and/or a plus or minus sign.ECMAScript automatically converts result strings to numericalvalues when used in numerical expressions. The result must notuse a leading zero (which would cause ECMAScript to interpret asan octal number). If the field is subsequently used in<say-as> with the interpret-as value "vxml:number", itwill be spoken as a number appropriate to the currentlanguage.
phone	Valid spoken inputs include phrasesthat specify a phone number. DTMF asterisk "*" represents "x".The result is a string containing a telephone number consistingof a string of digits and optionally containing the character "x"to indicate a phone number with an extension. For North America,a result could be "8005551234x789". If the field is subsequentlyused in <say-as> with the interpret-as value "vxml:phone",it will be spoken as a phone number appropriate to the currentlanguage.
time	Valid spoken inputs include phrasesthat specify a time, including hours and minutes. The result is afive character string in the format hhmmx, where x is one of "a"for AM, "p" for PM, "h" to indicate a time specified using 24hour clock, or "?" to indicate an ambiguous time. Input can bevia DTMF. Because there is no DTMF convention for specifyingAM/PM, in the case of DTMF input, the result will always end with"h" or "?". If the field is subsequently used in <say-as>with the interpret-as value "vxml:time", it will be spoken asa time appropriate to the current language.

An example of a <field> element with a builtin grammartype:

<field name="lo_fat_meal" type="boolean">  <prompt>    Do you want a low fat meal on this flight?  </prompt>  <help>    Low fat means less than 10 grams of fat, and under    250 calories.  </help>  <filled>    <prompt>      I heard <emphasis><say-as interpret-as="vxml:boolean">      <value expr="lo_fat_meal"/></say-as></emphasis>.    </prompt>  </filled></field>

In this example, the boolean type indicates that inputs arevarious forms of true and false. The value actually put into thefield is either true or false. The field would be read out usingthe appropriate affirmative or negative response in prompts.

In the next example, digits indicates that input will bespoken or keyed digits. The result is stored as a string, andrendered as digits using the <say-as> with"vxml:digits" as the value for the interpret-as attribute, i.e., "one-two-three", not "one hundred twenty-three". The<filled> action tests the field to see if it has 12 digits.If not, the user hears the error message.

<field name="ticket_num" type="digits">  <prompt>     Read the 12 digit number from your ticket.  </prompt>  <help>The 12 digit number is to the lower left.</help>  <filled>     <if cond="ticket_num.length != 12">       <prompt>         Sorry, I didn't hear exactly 12 digits.       </prompt>       <assign name="ticket_num" expr="undefined"/>     <else/>       <prompt>I heard <say-as interpret-as="vxml:digits">          <value expr="ticket_num"/></say-as>       </prompt>     </if>  </filled></field>

The builtin boolean grammar and builtin digits grammar can beparameterized. This is done by explicitly referring to builtingrammars using a platform-specific builtin URI scheme and using aURI-style query syntax of the formtype?param=value in the src attribute of a<grammar> element, or in the type attribute of a field, forexample:

<grammar src="builtin:dtmf/boolean?y=7;n=9"/><field type="boolean?y=7;n=9">   <prompt>     If this is correct say yes or press seven, if not, say no or press nine.  </prompt></field><field type="digits?minlength=3;maxlength=5">   <prompt>Please enter your passcode</prompt></field>

Where the <grammar> parameterizes the builtin DTMFgrammar, the first <field> parameterizes the builtin DTMFgrammar (the speech grammar will be activated as normal) and thesecond <field> parameterizes both builtin DTMF and speechgrammars. Parameters which are undefined for a given grammar typewill be ignored; for example, "builtin:grammar/boolean?y=7".

The digits and boolean grammars can be parameterized asfollows:

Table 67: Digit and Boolean GrammarParameterization
digits?minlength=n	A string of at leastn digits.Applicable to speech and DTMF grammars. If minlength conflictswith either the length or maxlength attributes then aerror.badfetch event is thrown.
digits?maxlength=n	A string of at mostn digits.Applicable to speech and DTMF grammars. If maxlength conflictswith either the length or minlength attributes then aerror.badfetch event is thrown.
digits?length=n	A string of exactlyn digits.Applicable to speech and DTMF grammars. If length conflicts witheither the minlength or maxlength attributes then aerror.badfetch event is thrown.
boolean?y=d	A grammar that treats the keypressd as an affirmative answer. Applicable only to the DTMFgrammar.
boolean?n=d	A grammar that treats the keypressd as a negative answer. Applicable only to the DTMFgrammar.

Note that more than one parameter may be specified separatedby ";" as illustrated above. When a <grammar> element withthe mode set to "voice" (the default value) is specified in a<field>, it is in addition to the default speech grammarmplied by the type attribute of the field. Likewise, when a<grammar> element with the mode set to "dtmf" is specifiedin a <field>, it is in addition to the default DTMFgrammar.

table of contents

[8]ページ先頭