2/28/2011 1Accenting and Information StatusJulia HirschbergCS 47062/28/2011 2Information Status• Topic/comment, theme/rhemeThe orangutan we wanted to buy escaped from the pet store.• Focus of attentionI only bought candy for that orangutan.•Given/newI only bought candy for that orangutan. I would never buy an ape drugs!• All commonly signaled in human speech by intonation2/28/2011 3Today: Acent and Given/New• Motivation in speech technology• Models of Given/New• Experiments on Given/New and pitch accent• Possible models of intonation wrt given/new entities• How might we identify given/new information automatically?• How should we produce given/new information appropriately?• Why is this important?2/28/2011 4A Simple Definition•Given: Recoverable from some form of contextor, what a Speaker believes to be in a Hearer’s consciousness•New: Not recoverable from context or, what a Speaker believes is not in a Hearer’s consciousness2/28/2011 5Role in Speech Technologies• TTS: Natural production– Given information is often deaccented– New information is usually accented• ASR: Improved recognition– Given information may already have been recognized earlier– New information may be important cue to topic shift• Summarization: Improved precision– Given information less likely to be included in a summary; new information more likely2/28/2011 6• Spoken Dialogue Systems: Grounding– Critical for system to convey what is given and what is new to facilitate Hearer comprehension2/28/2011 7Prince ’81: A More Complex Model• Speaker (S) and Hearer (H), in a discourse, construct a discourse model– Includes discourse entities, attributes, and links between entities– Discourse entities: individuals, classes, exemplars, substances, concepts (NPs)• Entities when first introduced are new– Brand-new (H must create a new entity)My dog bit a rhinoceros this morning.2/28/2011 8– Unused (H already knows of this entity)The sun came out this morning.• Evoked entities are old, or ‘given’ -- already in the discourse– Explicitly evoked (in text or speech)The rhinoceros was wearing suspenders. Rather unusual for a rhino.– Situationally evokedWatch out for the snake!• Inferables are also old, or ‘given’I bought a new car. The gear shift is a bit tricky.2/28/2011 9Prince ’92: A Still More Complex Model• Hearer-centric information status:–Given: what S believes H has in his/her consciousness–New: what S believes H does not have in his/her consciousness• But discourse entities may also be given and new wrt the current discourse– Discourse-old: already evoked in the discourse– Discourse-new: not evoked2/28/2011 10The stars are very bright tonight (Hearer-given; Discourse-new)When I see stars this bright, I think of my vacations in the mountains. (Hearer-given; Discourse-given)My friend Buddy and I would sneak out late at night. (Hearer-new; Discourse-new)I said, “My friend BUDDY…” (Hearer-new; Discourse-given)2/28/2011 11Given/New and Pitch Accent• New information is often accented and given information is often deaccented (Halliday ‘67, Brown ‘83, Terken ‘84) – But there are many exceptions: a simple TTS rule: accent ‘new’ and deaccent ‘given’will make 25-30% errors– How can we reduce these errors, to produce human-like intonation?2/28/2011 12Brown ‘83: Accent Status and Subclasses of Given/New• Speech elicitation in laboratory– 12 Scottish-English undergrads– A describes a diagram for B to draw, which Bcannot seeDraw a black triangle.Draw a circle in the middle.Draw a blue triangle next to the black one with a line from the top angle to the bottom.• Analysis: based on Prince ‘81 categories with modifications2/28/2011 13– Brand-new (a triangle), given:inferrable(middle, angle), given:contextually evoked(the page), given:‘textually’ evoked (divided into current topic vs. earlier mention)– Accent status of all entity-referring NPs• Results:– Brand-new information accented (87%)• Note: new entity/old expression issue– Given: contextually evoked information deaccented (98%)– Given: ’textually’ evoked deaccented (current topic 100%; earlier: 96%)– Given: inferable information accented (79%)2/28/2011 14Boston Directions Corpus (Hirschberg & Nakatani ’96)• Experimental Design• 12 speakers: 4 used• Spontaneous and read versions of 9 direction-giving tasks (monologues)• Corpus: 50m read; 67m spon• Labeling– Prosodic: ToBI intonational labeling– Given/new (Prince ’92), grammatical function, p.o.s.,…2/28/2011 15d1: dsp1: step 1: enter and get tokenfirstenter the Harvard Square T stopand buy a tokend2: dsp2: inbound on red linethenproceed to get on theinboundumRed Lineuh subwayBoston Directions Corpus: Describe how to get to MIT from Harvard2/28/2011 16dp3 dsp3: take subway from hs, to cs to ksandtake the subwayfrom Harvard Squareto Central Squareand then to Kendall Squaredp4: dsp4: get off T.then get off the T2/28/2011 17Hearer and Discourse Given/New Labelingfirstenter the Harvard Square T stopand buy a tokenthenproceed to get on theinboundumRed Lineuh subwayandtake the subwayfrom Harvard Squareto Central Squareand then to Kendall Squarethen get off the T2/28/2011 18Hearer and Discourse Given/New Labelingfirstenter <HG/DN the Harvard Square T stop>and buy <HI/DN a token>thenproceed to get on <HI/DN theinboundumRed Lineuh subway>andtake <HG/DG the subway>from <HG/DG Harvard Square>to <HG/DN Central Square>and then to <HG/DN Kendall Square>then get off <HG/DG the T>2/28/2011 19Does Given/New Status Predict Deaccenting?9505961304061009Total38.8%43.3%26.2%53.9%37.1%DeaccentedDNDGHNHIHGNPaHG: Hearer Given HI: Hearer Inferable HN: Hearer New DG: Discourse Given DN: Discourse New39.4% of (H or D) Given items deaccented…36.9% of (H or D) New Items are deaccented…2/28/2011 20And….Bard’99: Givenness, deaccenting and intelligibility• Speech elicited in laboratory– Glasgow Scottish-English Map Task• Each has a slightly different map• A traces a route described by B• Analysis– Compare repeated mentions of same items (i.e. given items) wrt accent status• Within dialogue• Across dialogue• Findings2/28/2011 21– Deaccenting rare in repeated mentions (within 15% and across 6% dialogues)– But repeated mentions were `less intelligible’• Caveats:– Were they really identifying ‘deaccenting’ (the absence of a pitch accent)?–
View Full Document