DOC PREVIEW
ICASSP03-Fuegen

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

ABSTRACTINTRODUCTIONARCHITECTURE OF LINGWEARModes of LingWearSPEECH AND LANGUAGE PROCESSINGSpeech RecognitionParsingDialogue ManagementTRANSLATIONLINGWEAR ON IPAQCONCLUSION AND FUTURE WORKACKNOWLEDGEMENTREFERENCESRECENT ADVANCES IN LINGWEAR: A WEARABLE LINGUISTIC ASSISTANT FOR TOURISTS Christian Fügen1, Tanja Schultz2, Jia-Cheng Hu2, Alex Waibel1,2 1Interactive Systems Labs University of Karlsruhe Am Fasanengarten 5 76131 Karlsruhe, Germany {fuegen,waibel}@ira.uka.de 2Interactive Systems Labs Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA 15213, USA {tanja,jiacheng,ahw}@cs.cmu.edu ABSTRACT In this paper we describe our recent advances in Ling-Wear, a wearable linguistic assistant for tourists. Ling-Wear allows uninformed users to find their way in for-eign cities or to ask for information about sightseeing, accommodations, and other places of interest. Moreover, the system allows the user to communicate with local residents through integrated speech-to-speech translation. Furthermore, the graphical user interface (GUI) of LingWear runs also on small hand-held devices (e.g. Compaq’s iPAQ). In this client-server solution the main components of the system are running on a wireless con-nected server. The user can query LingWear either by means of spontaneous speech or via touch screen and receive the system’s responds either by the integrated speech synthesis or by display messages. 1. INTRODUCTION Due to the rapid development in the area of hand-held devices, we expect the performance of such devices to be sufficient in the near future, in order to run processor and memory intensive applications. Therefore, it is our believe that the development of user friendly multimodal user interfaces including speech recognition and translation are within the reach for small wearable devices. Driven by this expectation we developed LingWear [1], a mobile tourist information system that allows uninformed users to find their way in foreign cities as well as to ask for information about sightseeing, accommodations, and other places of interest. Moreover, the system allows the user to communicate with local residents through inte-grated speech-to-speech translation. However, due to the lack of current computing power and memory storage of small hand-held devices a client-server model with a wireless communication to a LingWear server is adopted. This gives us first access to the newly developed platform and furthermore allows us to step-by-step migrate all other modules into the hand-held device. The next section gives a short overview of LingWear’s architecture and describes the variety of available modes. In section 3, we present our latest achievements in speech and language processing. Section 4 presents the transla-tion module of LingWear. We describe some results of our experiments in domain portability by extending se-mantic grammars by hand or by automatic learning for the new medical domain. In section 5, we deal with our cli-ent-server approach for LingWear. Section 6 concludes the paper and gives an outlook on future work. 2. ARCHITECTURE OF LINGWEAR The implementation of LingWear followed the standard design principles of light interfaces, which allow high flexibility and makes it easy to add new modules. Follow-ing this concept LingWear is based on a central communi-cation server (ComServer). Although all messages are forced to go through the communication server, this cen-tral communication has several advantages over a distrib-uted communication or communication via bus: • Since all modules which are connected to the ComServer are known by it, an error message can be returned, if a module is not accessible. • As a result of the direct communication between the modules, messages are solely sent to the individual module given by an ID. Grouping of IDs allows mes-sage broadcasting to a group of modules. • The direct communication reduces the processor load, since the rest of the modules do not have to analyze messages. 2.1. Modes of LingWear For a clear arrangement, we have divided LingWear into several modes, whereby each of the modes is represented by a special topic. The following modes are integrated in LingWear: • The tour mode displayed in Figure 1 presents infor-mation about sightseeing. The selection depends onFigure 1. Tour mode. Figure 2. Information mode. Figure 3. Translation mode. the user’s current location and preference. User pref-erences are handled through a user model. It is possi-ble to attach individual icons to the sightseeing places to identify whether the event is open or closed. • The navigation mode supports the user in finding the shortest route to specified places in the city. The route can be retrieved step by step, and additional informa-tion about sights can be presented along the way. Cur-rently we are investigating the usefulness of a GPS-augmented navigation. • The information mode as displayed in Figure 2 pro-vides information about sightseeing or other places of interests as stored in a database. The information is presented to the user via images and short text descrip-tions. • The translation mode as shown in Figure 3 enables non-native visitors to communicate with local resi-dents, a necessary function in situations like making a hotel reservation, or visiting a physician. . In addition to the presentation of the information on the screen, speech output is synthesized. For English, Ger-man, and Arabic we are currently using the speech syn-thesis system Festival [2], for Japanese the Fujitsu VoiceSeries provided by Animo Ltd. 3. SPEECH AND LANGUAGE PROCESSING The speech recognizer used in LingWear was built using the Janus Recognition Toolkit, JRTk [3]. In the current LingWear system we are applying IBIS [4], our recently developed one pass decoder, which is part of JRTk. Be-sides several other advantages compared to our old Janus three-pass search, like smaller memory usage and higher recognition speed, IBIS allows us to decode along con-text free grammars beside the classical statistical n-gram language models (LM). 3.1. Speech Recognition The typical speech recognizer in LingWear consist of a fully continuous system using approx. 2,000 context-dependent acoustic models with 16 Gaussians per model. Cepstral Mean Normalization is used to compensate for channel variations. In addition to the mean-subtracted mel-cepstral coefficients, the first and second order de-rivatives are


ICASSP03-Fuegen

Download ICASSP03-Fuegen
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view ICASSP03-Fuegen and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view ICASSP03-Fuegen 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?