DOC PREVIEW
Open Domain Speech Translation: From Seminars and Speeches to Lectures

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Open Domain Speech Translation: From Seminars and Speeches to LecturesChristian Fügen∗, Muntsin Kolss∗, Matthias Paulik†, Sebastian Stüker∗,Tanja Schultz†, Alex Waibel∗†∗Interactive Systems Labs (ISL)Universität Karlsruhe (TH), Germany{fuegen, kolss, stueker, waibel}@ira.uka.de†Interactive Systems Labs (ISL)Carnegie Mellon University, Pittsburgh, PA, USA{paulik, tanja}@cs.cmu.eduABSTRACTThis paper describes our ongoing work in domain unlim-ited speech translation. We describe how we developed alecture translation system by moving from speech transla-tion of European Parliament Plenary Sessions and seminartalks to the open domain of lectures. We started with ourspeech recognition (ASR) and statistical machine trans-lation (SMT) 2006 evaluation systems developed withinthe framework of TC-Star (Technology and Corpora forSpeech to Speech Translation) and CHIL (Computers inthe Human Interaction Loop). The paper presents thespeech translation performance of these systems on lec-tures and gives an overview of our final real-time lecturetranslation system.1. INTRODUCTIONGrowing international information structures and decreas-ing travel costs could make the dissemination of knowl-edge in this globalized world very easy – if only the lan-guage barrier could be overcome. Lectures are a very ef-fective method of knowledge dissemination. Such person-alized talks are the prefered method since they allow thespeakers to tailor their presentation toward a specific au-dience, and in return allow the listeners to get the mostrelevant information through interaction with the speaker.In addition, personal communication fosters the exchangeof ideas, allows for collaboration, and forms ties betweendistant units, e.g. scientific laboratories or companies.At the same time it is desirable to allow the presentersof talks and lectures to speak in their native language,since, no matter how proficient in a foreign language, onewill always feel more confident in the native tongue. Toovercome this obstacle human translators are currently theonly solution. Unfortunately, translation service are of-ten prohibitivelyexpensive such that many lectures are notgiven at all as a result of the language barrier. The use ofmodern machine translation techniques have the potentialto provide translation services at no costs to a wide audi-ence, making it possible to overcome the language barrierand bring the people closer together.This paper describes our ongoing work in unlimited do-main speech translation of lectures starting from systemsbuilt within the framework of CHIL and TC-STAR.CHIL [25], Computers in the Human Interaction Loop,aims at making significant advances in the fields ofspeaker localization and tracking, speech activity de-tection and distant-talking automatic speech recognition.Therefore, in addition to the near and far-field micro-phone, seminars were also recorded by calibrated videocameras. The long-term goal is the ability to recognizespeech in a real reverberantenvironment, without any con-straint on the number or distribution of microphones in theroom nor on the number of sound sources active at thesame time.TC-STAR [20], Technologies and Corpora for Speech-to-Speech-Translation, is envisaged as a long-term effortto advance research in all core technologies for Speech-to-Speech Translation (SST) which is a combination ofAutomatic Speech Recognition (ASR), Spoken LanguageTranslation (SLT) and Text to Speech (TTS). The objec-tive of the project is to make a breakthrough in SST thatsignificantly reduces the gap between human and machinetranslation performance. The focus hereby is on the devel-opment of new algorithms and methods. So far the projecttargets a selection of unconstrained conversational speechdomains – speeches and broadcast news – and three lan-guages: European English, European Spanish, and Man-darin Chinese.The paper is organized as follows: The developmentalwork started from our 2006 ASR and SMT evaluationsystems for European Parliament Plenary Session (EPPS,TC-STAR) and the NIST Rich Transcription evaluationRT-06S on seminars (CHIL). In Section 3, we first com-pare the different ASR systems of both domains and showhow we merged these systems for lecture recognition. Fur-thermore, we present first results of acoustic and languagemodel adaptation on the lecture domain. In Section 4, wegive statistical machine translation results on text and ASRinput for lectures of our 2006 SMT evaluation system forEPPS. In addition, we explain in detail how we adaptedour EPPS SMT system towards the more conversationalstyle of lectures and present the corresponding machinetranslation results. Section 5 provides an overview of ourreal-time lecture translation system, Section 6 concludesthis paper.2. DEVELOPMENT AND EVALUATIONDATAFor the automatic speech recognition and statistical ma-chine translation experiments on lectures, we selectedthree different lectures as development and evaluationdata. The three lectures were given in English by the samenon-native speaker on different topics. All lectures wererecorded with close talking microphones [3].Dev: A 24min talk that was held to give a broad overviewof current research projects in our lab and is thereforeideal as development set.t035: A 35min talk held as a conference key-note, onlypartly covered by the Dev talk, which gives us theopportunity to evaluate how our system behaves onan unseen domain.t036+: A 31min talk on the same topic as t035, but held ina different environmental setting and situation, whichallows us to evaluate the robustness of our system.For the ASR experiments we used the seminar part of theNIST RT-06S development data and the 2006 EPPS devel-opment data as additional data sources.3. SPEECH RECOGNITIONIn this section we first compare the 2006 evaluation sys-tems for European Parliament Plenary Sessions [19] andCHIL seminars [4] and describe the development of a sin-gle system, which performs almost as good as the evalu-ation systems on both domains respectively. This is fol-lowed by the presentation of the system’s performance onthe lecture domain. Lectures are an ideal showcase forspeaker and domain adaptation tasks, since the lecturerand the topic might be known in advance [1]. Therefore,we describe acoustic and language model adaptation re-sults in the last part of this section. Different from thework [3] we will in this paper take the 2006 EPPS evalua-tion into consideration for the development of our


Open Domain Speech Translation: From Seminars and Speeches to Lectures

Download Open Domain Speech Translation: From Seminars and Speeches to Lectures
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Open Domain Speech Translation: From Seminars and Speeches to Lectures and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Open Domain Speech Translation: From Seminars and Speeches to Lectures 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?