Voice XML Voice Markup Language Presented by Hongliang Xu Presentation Overview What is VoiceXML Introduction to VoiceXML Overview of VoiceXML A Sample VoiceXML Application Summary History What is VoiceXML VoiceXML is a standard for voice based communication VoiceXML is an XML language which plays the role of the language of communication in voice application similar to the role played by HTML in web application Also like other XML technologies VoiceXML seamlessly integrates with existing web based technologies and can be used with any existing server side technology such as ASP Java servelets Goal of VoiceXML VoiceXML s main goal is to bring the full power of web development and content delivery to voice response applications and to free the authors of such applications from low level programming and resource management Concepts Dialogs and Subdialogs Sessions Grammars Events Links Application Introduction to VoiceXML Two Short examples The advantages of using VoiceXML The architectural design of VoiceXML Elements of the VoiceXML Implementation Hello World Example Here are two short examples of VoiceXML The first is the venerable Hello World xml version 1 0 vxml version 1 0 form block Hello World block form vxml The top level element is vxml which is mainly a container for dialogs There are two types of dialogs forms and menus Forms present information and gather input menus offer choices of what to do next This example has a single form which contains a block that synthesizes and presents Hello World to the user Since the form does not specify a successor dialog the conversation ends Another Short Example The second example asks the user for a choice of drink and then submits it to a server script xml version 1 0 vxml version 1 0 form field name drink prompt Would you like coffee tea milk or nothing prompt grammar src drink gram type application x jsgf field block submit next http www drink example drink2 asp block form vxml The advantages of using VoiceXML Minimizes client server interactions by specifying multiple interactions per document Shields application developers from low level and platform specific details Separates the user interaction code which is given in VoiceXML from service logic Promotes service portability across implementation platforms VoiceXML is a common language for content providers tool providers and platform providers Is easy to use for simple interactions and yet provide language features to support complex dialogs The architectural design of VoiceXML Voice User Translation process VoiceXML Client e g PC Content Server Client e g mobile phone VoiceXML Gateway Elements of the VoiceXML Implementation Document Server request Document Interceptor Context interceptor Implementation Platform Overview of VoiceXML Grammar Structure of VoiceXML Applications Different Types of Forms A Form Dialog A Menu Dialog Sub Dialogs within a Form Transition between Dialogs Variables in VoiceXML Dialogs Event Handling in VoiceXML Grammar A grammar defines the allowable inputs submitted by the user The grammar element is used to provide a speech grammar that specifies a set of utterances that a user may speak to perform an action or supply information and provides a corresponding string value in the case of a field grammar or set of attribute value pairs in the case of a form grammar to describe the information or action The grammar element is designed to accommodate any grammar format that meets these two requirements At this time VoiceXML does not specify a grammar format nor require support of a particular grammar format This is similar to the situation with recorded audio formats for VoiceXML and with media formats in general for HTML Structure of VoiceXML applications Root Document Document1 Document2 Dialog 1 Document3 Dialog 2 Subdialog 1 Document4 Dialog 3 Subdialog 2 Forms contains A set of form items Form items are subdivided into field items those that define the form s field item variables and control items Declarations of non field item variables Event handlers Filled actions blocks of procedural logic that execute when certain combinations of field items are filled in Form attributes are id and form Different Types of Forms Directed Forms Mixed Initiative Forms Directed Forms The simplest and most common type of form is one in which the form items are executed exactly once in sequential order to implement a computer directed interaction Here is a weather information service that uses such a form form id weather info block Welcome to the weather information service block field name state prompt What state prompt grammar src state gram type application x jsgf catch event help Please speak the state for which you want the weather catch field field name city Directed Forms continued grammar src city gram type application x jsgf catch event help Please speak the city for which you want the weather catch field block submit next servlet weather namelist city state block form This dialog proceeds sequentially C computer Welcome to the weather information service What state H human Help C Please speak the state for which you want the weather H Georgia C What city H Tblisi C I did not understand what you said What city H Macon Mixed Initiative Forms Directed forms implementing rigid computer directed conversations To make a mixed initiative form where both the computer and the human direct the conversation it must one or more initial form items and one or more form level grammars If a form has form level grammars Its fields can be filled in any order More than one field can be filled as a result of a single user utterance Also the form s grammars can be active when the user is in other dialogs If a document has two forms on it say a car rental form and a hotel reservation form and both forms have grammars that are active for that document a user could respond to a request for hotel reservation information with information about the car rental and thus direct the computer to talk about the car rental instead The user can speak to any active grammar and have fields set and actions taken in response A Form Dialog Fields Recording input Blocks and objects Fields A field specifies an input item to be gathered from the user The field type attribute is used to specify a built in grammar for one of the fundamental types and also specifies how its value is to be spoken if subsequently used in a value attribute in a prompt An example field name lo fat meal type boolean prompt
View Full Document
Unlocking...