DOC PREVIEW
Columbia COMS W4705 - Summarization and Generation

This preview shows page 1-2-3-4-5-6-40-41-42-43-44-81-82-83-84-85-86 out of 86 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 86 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Summarization and GenerationWhat is Summarization?Summarization is not the same as Language GenerationSummarization TasksInput Data -- STREAKRevision ruleSlide 7BriefingsHow is summarization done?Sample TemplateDocument SummarizationSummarization ProcessProblems with Sentence ExtractionCut and Paste in Professional SummarizationMajor Cut and Paste OperationsSlide 16Slide 17Slide 18Summarization at ColumbiaSlide 20Cut and Paste Based Single Document Summarization -- System Architecture(1) Decomposition of Human-written Summary SentencesSample Decomposition OutputDecomposition of human-written summariesSlide 25Slide 26Algorithm for Decomposition(2) Sentence ReductionAlgorithm for Sentence ReductionStep 1: Use linguistic knowledge to decide what MUST NOT be removedStep 2: Determining context importance based on lexical linksStep 2: Determining context importance based on lexical linksSlide 33Step 3: Compute probabilities of humans removing a phraseStep 4: Make the final decisionEvaluation of ReductionMulti-Document Summarization Research FocusApproachNewsblasterNewsblaster ArchitectureSlide 41FusionSentence Fusion ComputationSlide 44Slide 45Tracking Across DaysSlide 47Slide 48Slide 49Slide 50Different PerspectivesSlide 52Slide 53Slide 54Multilingual SummarizationIssues for Multilingual SummarizationMultilingual RedundancySlide 58Multilingual Similarity-based SummarizationSimilarity ComputationSentence 1Sentence SimplificationSimplification ExamplesResults on alquds.co.uk.195EvaluationUser Study: ObjectivesUser Study: DesignUser Study: Execution“Geneva” PromptMeasuring EffectivenessSlide 74User SatisfactionUser Study: ConclusionsEmail SummarizationEmail Summarization: ApproachEmail Summarization by Sentence ExtractionData for Sentence ExtractionSlide 81Sample Automatically Generated Summary (ACM0100)Information Gathering Email: The ProblemDetection of QuestionsDetection of AnswersIntegrating QA detection with summarizationIntegrated in Microsoft OutlookMeeting Summarization (joint with Berkeley, SRI, Washington)Conclusions1Summarization and Summarization and GenerationGenerationCS 47052 What is Summarization?What is Summarization?What is Summarization?What is Summarization?Data as input (database, software trace, expert system), text summary as outputText as input (one or more articles), paragraph summary as outputMultimedia in input or outputSummaries must convey maximal information in minimal space3 Summarization is not the same Summarization is not the same as Language Generationas Language GenerationKarl Malone scored 39 points Friday night as the Utah Jazz defeated the Boston Celtics 118-94.Karl Malone tied a season high with 39 points Friday night….… the Utah Jazz handed the Boston Celtics their sixth straight home defeat 118-94.Streak, Jacques Robin, 19934 Summarization TasksSummarization TasksLinguistic summarization: How to pack in as much information as possible in as short an amount of space as possible?Streak: Jacques RobinMAGIC: James ShawPLanDoc: Karen Kukich, James Shaw, Rebecca Passonneau, Hongyan Jing, Vasilis HatzivassiloglouConceptual summarization: What information should be included in the summary?5 Input Data -- STREAKInput Data -- STREAKInput Data -- STREAKInput Data -- STREAKscore (Jazz, 118)score (Celtics, 94)The Utah Jazz beat theCeltics 118 - 94.points (Malone, 39) Karl Malone scored 39pointslocation(game,Boston)It was a home gamefor the Celtics#home-defeats(Celtics, 6)It was the 6th straighthome defeat6 Revision ruleRevision ruleRevision ruleRevision rulebeatbeatJazzJazzCelticsCelticshandhandJazzJazzdefeatdefeatCelticsCeltics7 SUMMONS QUERY OUTPUTSummary:Wednesday, April 19, 1995, CNN reported that anexplosion shook a government building inOklahoma City. Reuters announced that at least 18people were killed. At 1 PM, Reuters announcedthat three males of Middle Eastern origin werepossibly responsible for the blast. Two days later,Timothy McVeigh, 27, was arrested as a suspect,U.S. attorney general Janet Reno said. As of May29, 1995, the number of victims was 166.Image(s):1 (okfed1.gif) (WebSeek)Article(s):(1) Blast hits Oklahoma Citybuilding(2) Suspects' truck said rented from Dallas(3) At least 18 killed in bombblast - CNN(4) DETROIT (Reuter) - A federal judgeMonday ordered James(5) WASHINGTON (Reuter) - Asuspect in the Oklahoma CitybombingSummons, Dragomir Radev, 19958 BriefingsBriefingsBriefingsBriefingsTransitionalAutomatically summarize series of articles Input = templates from information extractionMerge information of interest to the user from multiple sourcesShow how perception changes over timeHighlight agreement and contradictionsConceptual summarization: planning operatorsRefinement (number of victims)Addition (Later template contains perpetrator)9 How is summarization done?How is summarization done?How is summarization done?How is summarization done?4 input articles parsed by information extraction system4 sets of templates produced as outputContent planner uses planning operators to identify similarities and trendsRefinement (Later template reports new # victims)New template constructed and passed to sentence generator10 Sample TemplateSample TemplateSample TemplateSample TemplateMessage ID TST-COL-0001Secsource: source ReutersSecsource: date 26 Feb 93Early afternoonIncident: date 26 Feb 93Incident: location World Trade CenterIncident:Type BombingHum Tgt: number At least 511 Document SummarizationDocument SummarizationInput: one or more text documentsOutput: paragraph length summarySentence extraction is the standard methodUsing features such as key words, sentence position in document, cue phrasesIdentify sentences within documents that are salientExtract and string sentences togetherLuhn – 1950sHovy and Lin 1990sSchiffman 2000Machine learning for extractionCorpus of document/summary pairsLearn the features that best determine important sentencesKupiec 1995: Summarization of scientific articles12 Summarization ProcessSummarization ProcessShallow analysis instead of information extractionExtraction of phrases rather than sentencesGeneration from surface representations in place of semantics13 Problems with Sentence Problems with Sentence ExtractionExtractionExtraneous phrases“The five were apprehended along Interstate 95, heading south in vehicles containing an array of gear including … ...


View Full Document

Columbia COMS W4705 - Summarization and Generation

Download Summarization and Generation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Summarization and Generation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Summarization and Generation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?