DOC PREVIEW
UT CS 378 - Speech Processing - Present, Past and Future

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 378 Natural Language Processing *** Speech Processing: Present, Past and FutureSlide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15CS 378Natural Language Processing***Speech Processing: Present, Past and FutureInge M. R. De BleeckerDepartment of [email protected] 14, 2003Main Types of NLP Applications 2•Text processing: information retrieval and search engines, information extraction, text summarization, machine translation, question-answering…•Speech processing: speech recognition (ASR) over-the-telephone (OTT) and dictation systems (desktop), speaker verification, text-to-speech (TTS)Overview 3•Speech industry: history (late 80’s to present)•Practical Overview of current applications and their future directions:–Speech recognition accuracy–Text-to-speech accuracy–Usability and design–Application building tools•Working in the speech industryHistory: Late Eighties 4Sentiment: OTT ASR finally ready for commercial applications…Technology: OTT speaker-independent discrete digits/yesno apps. Word-based language models. TTS mostly used for numbers, if used at all. Pre-recorded strings much more common.Applications: simple in structure and functionality OTT: banking, e.g. ask for account balance. Desktop: first dictation systems, medical applications.Companies: few. Small research-oriented companies, or research arms of big companies. E.g. Dragon Systems, VPC, VCS, Kurzweil, BBN, AT&T, …History: Early Nineties 5Sentiment: credibility and usability of apps grows. Multilingual developments.Technology: OTT SI continuous digits/yesno/command word apps. Move to phoneme-based language models. Applications: OTT: still simple, system-directed dialog (vs user-directed, mixed-initiative)Desktop: more dictation systems, command and control systems (user-directed)Companies: more companies pop up. Most grow out of research communities.History: Mid to Late Nineties 6Technology: maturing of technologies used.Companies: –overall growth–dirty politics (L & H)–mergers and buyouts start (still ongoing today)History: Late Nineties to Present 7Technology: maturing of technology continues–better recognition accuracy–unrestricted ASR input (natural speech)–move to more sophisticated dialog systems (see next slide)–tool standardizationApplications: Wider use of apps. More attention to usability, dialog design, etc…Dialog System Architecture 8ASR Parser ReasoningOutput GenerationTTSSpeech Recognition Accuracy 9Present: reasonable accuracy on natural speech. Most systems still use grammar to help recognizer. Grammars are written in VoiceXML or vendor-specific language, not very sophisticated from a linguistics point of view. Some systems are (theoretically) purely statistical. E.g. Nuance’s Accuroute.Future: need to add more linguistic principles to current statistic methods. Make signal processing more robust, encourage reusability.TTS Accuracy 10Present: getting better all the time. During the last few years, additional research in prosody, intonation has paid off. More naturally sounding speech. Also deals with abbreviations, etc. Current TTS can be used to patch up ‘real’ speech. E.g. AT&T, Scansoft (Speechworks).Future: probably never a complete substitute for pre-recorded strings.Usability – Dialog Design 11PresentDialog design (VUI) is becoming more sophisticated through–use of natural speech input–mixed-initiative dialogs (more complicated for novice users)–chatty applications which provide gracious ways of dealing with low accuracy confirmations and errors, fall-back to system-directed dialog,…–use of persona: e.g. Bell Canada’s EmilyFutureContinued improvements in dialog design are necessary (e.g. usability studies). Dialog design is easier with current (and future) tools, but… still an art!It is (too) easy to design bad speech applications…Usability – Other Issues 12Present–Natural language generation (NLG) is not receiving much attention–Reasoning components very limitedFuture–NLG needs to adapt to user, conform more to human speech patterns –multimodal applications–multilingual systems–use of e.g. ontologies in reasoning components, …Application Building Tools 13Present–Standardization: VoiceXML and VoiceXML platforms (alternative: SALT)–Many platform companies: VoiceGenie, Bevocal, Audium,…–Also companies developing tools for platforms: ApteraVoiceXML–World of VoiceXML: comprehensive site on all things VoiceXML–Free developer’s resources: e.g. Bevocal–Small companies: can have voicexml app hosted by a platform company–Big companies: in-house platforms (telco-industry grade equipment), quite costly FutureDevelopment of better tools, that make it harder to build bad applications!Speech Apps State-of-the-Art 14Conclusion:ASR and TTS are usable in real-world applications right now. To develop better applications, we need to improve accuracy, usability, etc or… think about some radically different approaches to the current problems! (=> the “age-old” argument)Working in the Speech Industry 15Working for:A speech recognition/text-to-speech company: a CS undergraduate can work on software development of tools, deployments. With addition of some linguistics classes: dialog designer, QA of deployments, …A VoiceXml platform company: general software development, …A tools company: general software development, …A consulting (services) company: dialog design, deployments. Or…Get a Ph.D. in EE and become a speech scientist who develops the next generation speech


View Full Document

UT CS 378 - Speech Processing - Present, Past and Future

Documents in this Course
Epidemics

Epidemics

31 pages

Discourse

Discourse

13 pages

Phishing

Phishing

49 pages

Load more
Download Speech Processing - Present, Past and Future
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Speech Processing - Present, Past and Future and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Speech Processing - Present, Past and Future 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?