New version page

Mutual Agreement

Upgrade to remove ads

This preview shows page 1-2-3 out of 8 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

Toward Spoken Dialogue as Mutual Agreement Susan L. Epstein1,2, Joshua Gordon4, Rebecca Passonneau3, and Tiziana Ligorio2 1Hunter College and 2The Graduate Center of The City University of New York, New York, NY USA 3Center for Computational Learning Systems and 4Department of Computer Science, Columbia University New York NY USA [email protected], [email protected], [email protected], [email protected] Abstract This paper re-envisions human-machine dialogue as a set of mutual agreements between a person and a computer. The intention is to provide the person with a habitable experi-ence that accomplishes her goals, and to provide the com-puter with sufficient flexibility and intuition to support them. The application domain is particularly challenging: for its vocabulary size, for the number and variety of its speakers, and for the complexity and number of the possible instantiations of the objects under discussion. The brittle performance of a traditional spoken dialogue system in such a domain motivates the design of a new, more robust social system, one where dialogue is necessarily represented on a variety of different levels. Introduction A spoken dialogue system (SDS) has a social role: it sup-posedly allows people to communicate with a computer in ordinary language. A robust SDS should support coherent and habitable dialogue, even when it confronts situations for which it has no explicit pre-specified behavior. To en-sure robust task completion, however, SDS designers typi-cally produce systems that make a sequence of rigid de-mands on the user, and thereby lose any semblance of nat-ural dialogue. The thesis of our work is that a dialogue should evolve as a set of agreements that arise from joint goals and the collaboration of communicative interaction (Clark and Schaefer, 1989). The role of metacognition here is to use both self-knowledge and learning to represent dia-logue and to enhance the SDS. As a result, dialogue should become both more habitable for the person and more suc-cessful for the computer. This paper discusses the chal-lenges for an SDS in an ambitious domain, and describes a new, metacognitively-oriented system under development to address the issues that arise in human-machine dialogue. Our domain of investigation is the Heiskell Library for the Blind and Visually Impaired, a branch of The New York Public Library and part of The Library of Congress. Heiskell’s patrons order their books by telephone, during Copyright © 2010, Association for the Advancement of Artificial Intelli-gence (www.aaai.org). All rights reserved. conversation with a librarian. The volume of calls from its 5028 active patrons, however, promises to outstrip the ser-vice currently provided by its 5 librarians. The next section of this paper describes the challenges inherent in spoken dialogue systems. Subsequent sections describe a traditional SDS architecture, demonstrate the brittle behavior of an SDS built within it, and re-envision a new SDS within the structure of a cognitively-plausible ar-chitecture. The paper then posits a paradigm that endows human-machine dialogue with metacognition, explains how metacognition is implemented in this re-envisioned system, and reports on the current state of its development. Challenges in SDS Implementation The social and collaborative nature of dialogue challenges an SDS in many ways. The spontaneity of dialogue gives rise to disfluencies, where a person repeats or interrupts herself, produces filled pauses or false starts and self-repairs. Disfluencies play a fundamental role in dialogue, as signals for turn-taking (Gravano, 2009; Sacks, Schegloff and Jefferson, 1974) and for grounding to establish shared beliefs about the current state of mutual understanding (Clark and Schaefer, 1989). Most SDSs handle the content of the user’s utterances, but do not fully integrate the way they address utterance meaning, disfluencies, turn-taking and the collaborative nature of grounding. During dialogue, people simultaneously manage turn-taking and process speech. The complexity of speech rec-ognition for multiple speakers, however, requires the SDS to have an a priori dialogue strategy that determines how much freedom it offers the user. An SDS that maintains system initiative completely controls the path of the dia-logue, and dictates what the person may or may not say during her turn. (“SAY 1 FOR ORDERS, SAY 2 FOR CUSTOMER SERVICE, OR…”). In contrast, habitable dialogue requires mixed initiative, where the user and the system share control of the path the dialogue takes. Of course, mixed initiative runs the risk that the system will find itself in a state unanticipated by its designer, and no longer re-spond effectively and collaboratively. Because fallback re-sponses (e.g., asking the user to repeat or start over) are brittle, current mixed-initiative systems pre-specify howmuch initiative a user may take, and restrict that initiative to specific kinds of communicative acts. An SDS receives a continuous stream of acoustic data. Automated Speech Recognition (ASR) translates it into discrete linguistic units (e.g., words and phonemes) repre-sented as text strings. Such continuous speech recognition over a large vocabulary for arbitrary speakers presents a major challenge. The Heiskell Library task includes 47,665 distinct words from titles and author names, with a target user population that varies in gender, regional accent, na-tive language, and age. Moreover, telephone speech is sub-ject to imperfect transmission quality and background noise. For example, the word error rate (WER) for Let’s Go Public! (Raux et al., 2005) went from 17% under con-trolled conditions to 68% in the fielded version. Speech engineering for a specific application can reduce WER, but dialogue requires more than perfect transcrip-tion; it requires both the speaker’s meaning and her intent. Once it has recognized the other’s intent, a dialogue par-ticipant must also respond appropriately. An SDS tries to confirm its understanding with the user through the kinds of grounding behaviors people use with one another.


Download Mutual Agreement
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Mutual Agreement and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Mutual Agreement 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?