MIT 6 893 - Controlling Jaim Through Speech Commands - D2587292

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 893> Controlling Jaim Through Speech Commands

DOC PREVIEW

MIT 6 893 - Controlling Jaim Through Speech Commands

School name Massachusetts Institute of Technology

Course 6 893-

Pages 5

This preview shows page 1-2 out of 5 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 5 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Oxygen Alliance 2003 Workshop – February 24-28, 2003SpeechBuilder Hands-on Activity: Controlling Jaim Through SpeechCommands1 IntroductionIn this assignment, we will use SpeechBuilder and the Galaxy speech processing system to create a speechinterface to our instant messenger client, Jaim. The objective is to allow for the user to use voice commandsto perform various functions in Jaim. Your first and primary task will be to add support for connectioncommands, such as “Connect me to Bob.” The audio for the voice commands will come either from yourhandheld our your tablet PC. The Galaxy system and SpeechBuilder will both run on your tablet. YourJaim program will connect to Galaxy using the “frame relay” – a special Galaxy server that allows externalapplications to communicate with the speech processing system.2 Learn the SpeechBuilder Interface• First, you need to make a SpeechBuilder domain for the commands that you wish to support. AccessSpeechBuilder on your tablet by going to:http://localhost/SpeechBuilder/SpeechBuilder.cgi2.1 DomainsYour login is guest, password also guest. You will have one domain in your account, named “house.” Thisis a small toy domain you can use to get familiar with the SpeechBuilder interface.• Once you get to the main splash screen, select the house domain and hit “Edit.” When you select adomain, you start at the action/attribute editing screen. In SpeechBuilder, you navigate with the drop-boxat the top right of the screen (the one that says “Select another domain”). In this screen, you can view thedetails of a particular action or attribute.2.2 Editing actions and attributes• Take a look at the “status” action by selecting “status” in the action drop box and hitting the “Edit”button. You can add new sentences to the action by typing them in the editing window and hitting “Apply.”You can select sentences to edit in the selection window, and hit “Edit” to pull them in the editing window.• Think of another way of asking about things in the house (e.g. “i am wondering if the lights are turned onin the living room”) and add it to the action. The example sentences in the “house” domain have variousgrammar syntax elements (e.g. “|”, “[ ]”). You can familiarize yourself with the syntax of these, but youdon’t have to use them. Just type a sentence as you would say it in English. You don’t need any capitalletters or punctuation.• From this screen, you can also edit attributes. Select the “room” attribute and hit “Edit.” Add a newtype of room to the list.Once you add a room to the attribute “room,” it becomes interchangeable with all the other rooms inthe action sentences. For example, if you add “bathroom,” you gain the ability to say “are the lights inthe bathroom on” without explicitly adding this sentence to the “status” action. This generalization is apowerful feature of SpeechBuilder.2.3 Back-end program locationAs was discussed in the presentation, SpeechBuilder is able to configure a complete speech domain basedon only a database table. However, in this lab we will use a different configuration, in which the domain1connects to an external application. The external application, in our case, will eventually be Jaim. However,for the house domain we will use an “echo script” – a CGI script that just repeats what the system thinksyou said.• From the main navigation pull-down, select “Edit back-end script URL.” Make sure that the URL ishttp://localhost/cgi-bin/echo.cgi – the echo script. You shouldn’t need to edit it for now.2.4 Compiling the domainNext, we will “compile” the domain. Compiling creates all the grammars and configuration files necessaryfor the Galaxy components to run properly.• Click on the “Compile” button in the upper right of the SpeechBuilder screen. You may have to wait a fewseconds as the domain is compiled. Skim over the output and see if you can figure out what is happening.Note: You will need to compile the domain each time you make a change. No changes willtake effect until you compile. Don’t worry if you get some chmod errors, that is normal.2.5 Reduced sentencesSpeechBuilder can “reduce” your action sentences for you to a CGI-like meaning representation. This is doneusing the TINA natural language processor, which is also used to process the speech recognized during theruntime of your domain.• Select “See reduced sentences” from the pulldown list and hit “Go.” Take a look at the sentences. Notethat only the words specified in the attributes make it into the frame. TINA extracts the meaningful wordsin a sentence by processing it according to a hierarchical parsing grammar.• Click on one of the sentences to see a graphical representation of the TINA parse tree. The grammar forthe house domain is relatively flat, so you don’t see much hierarchy in the parse tree. We won’t be needingany concept hierarchy for our Jaim domain, either. However, a developer can enforce a concept hierarchy inthe parsing grammar by using hierarchical concepts (as outlined in the presentation).2.6 Talking to your domainNow that you have compiled your domain, you can actually talk to it!• You need to start several things on your machine to make this possible:• We will use an open source speech synthesizer called Festival. Festival is not part of Galaxy, but we willuse it for now (in the future, SpeechBuilder will give you the ability configure your own concatenativespeech synthesizer). Open a new window and start Festival by typingfestival --server• Open a new window and start the Galaxy servers. You will need to go to the domain directory; do soby typingcd /home/sls/Galaxy/SpeechBuilder/users/guest/DOMAIN.houseThen, open a new window, and start the domain by typing./oxclass.cmd yes yesThe first “yes” tells the script to pop up windows to show you the output of the various servers that itwill run (e.g. speech recognition, natural language processing, etc.) The second “yes” tells the domainto use Festival. In the future, you can say “no” for the first switch if you don’t care about the outputof the servers (but it usually useful for debugging). Take a look at what servers get started. Try tofigure out what each one of them is trying to do.2• Adjust the mixer. On the tablet, go to the KDE menu → Multimedia → Sound Mixer. Then adjustyour main volume up to about 80% of maximum. On the iPAQ, go

View Full Document