11CS276BWeb Search and MiningWinter 2005Lecture 8Information Extraction IIFeb 1, 2005(includes slides borrowed from BBN, David Blei, Nick Kushmerick, Andrew McCallum, and Ray Mooney) 2Information ExtractionLeslie Pack Kaelbling, Michael L. Littmanand Andrew W. Moore. ReinforcementLearning: A Survey. Journal of ArtificialIntelligence Research, pages 237-285,May 1996.AuthorTitleJournalReferencesHeaders3Plan for IEFirst classIntroduction to the IE problemWrappers and Wrapper InductionTraditional NLP-based IE: MUC competitionsToday Pattern Learning Systems: RapierProbabilistic sequence models: HMMs4Learning for IEWriting accurate patterns for each slot for each domain (e.g. each web site) requires laborious software engineering.Alternative is to use machine learning:Build a training set of documents paired with human-produced filled extraction templates.Learn extraction patterns for each slot using an appropriate machine learning algorithm. 5Automatic Pattern-Learning SystemsPros:Portable across domainsTend to have broad coverageRobust in the face of degraded input.Automatically find appropriate statistical patternsSystem knowledge not needed by those who supply the domain knowledge.Cons:Annotated training data, and lots of it, is needed.Isn’t necessarily better or cheaper than hand-built sol’nExamples:Riloff et al., AutoSlog, Soderland WHISK (UMass); Mooney et al. Rapier (UTexas); Ciravegna (Sheffield)Learn lexico-syntactic patterns from templatesTrainerDecoderModelLanguageInputAnswersAnswersLanguageInput6Rapier [Califf & Mooney, AAAI-99]Rapier learns three regex-style patterns for each slot:Pre-filler pattern Filler pattern Post-filler patternOne of several recent trainable IE systems that incorporate linguistic constraints. (See also: SIFT [Miller et al, MUC-7]; SRV[Freitag, AAAI-98]; Whisk [Soderland, MLJ-99].)RAPIER rules for extracting “transaction price”“…paid $11M for the company…”“…sold to the bank for an undisclosedamount…”“…paid Honeywell an undisclosedprice…”27Part-of-speech tags & Semantic classesPart of speech: syntactic role of a specific wordnoun (nn), proper noun (nnp), adjectve (jj), adverb (rb), determiner (dt), verb (vb), “.” (“.”), …NLP: Well-known algorithms for automatically assigning POS tags to English, French, Japanese, … (>95% accuracy)Semantic Classes: Synonyms or other related words“Price” class: price, cost, amount, …“Month” class: January, February, March, …, December“US State” class: Alaska, Alabama, …, Washington, WyomingWordNet: large on-line thesaurus containing (among other things) semantic classes8Rapier rule matching example“…sold to the bank for an undisclosed amount…”POS: vb pr det nn pr det jj nnSClass: price“…paid Honeywell an undisclosed price…”POS: vb nnp det jj nnSClass: price9Rapier Rules: DetailsRapier rule :=pre-filler patternfiller patternpost-filler patternpattern := subpattern +subpattern := constraint +constraint :=Word - exact word that must be presentTag - matched word must have given POS tagClass - semantic class of matched wordCan specify disjunction with “{…}”List length N - between 0 and N words satisfying other constraints10Rapier’s Learning AlgorithmInput: set of training examples (list of documents annotated with “extract thissubstring”)Output: set of rulesInit: Rules = a rule that exactly matches each training example Repeat several times:Seed: Select M examples randomly and generate the Kmost-accurate maximally-general filler-only rules(prefiller = postfiller = “true”).Grow:Repeat For N = 1, 2, 3, …Try to improve K best rules by adding N context wordsof prefiller or postfiller contextKeep:Rules = Rules ∪ the best of the K rules – subsumed rules11Learning example (one iteration)2 examples:‘… located in Atlanta, Georgia…”‘… offices in Kansas City, Missouri…’maximally specific rules(high precision, low recall)maximally general rules(low precision, high recall)appropriately general rule (high precision, high recall)InitSeedGrow12Statistical generative modelsPrevious discussion examined systems that use explicit extraction patterns/rulesSequence Models are statistical models of whole token sequences that effectively label subsequencesBest known case is generative Hidden Markov Models (HMMs) Pros:Well-understood underlying statistical models make it easy to used wide range of tools from statistical decision theoryPortable, broad coverage, robust, good recallCons:Range of features and patterns usable may be limitedNot necessarily as good for complex multi-slot patterns313Name Extraction via HMMsTextSpeechRecognitionExtractorSpeechEntitiesNEModelsLocationsPersonsOrganizationsThe delegation, which included the commander of the U.N. troops in Bosnia, Lt. Gen. Sir Michael Rose, went to the Serb stronghold of Pale, near Sarajevo, for talks with Bosnian Serb leader Radovan Karadzic.TrainingProgramtrainingsentencesanswersThe delegation, which included the commander of the U.N. troops in Bosnia, Lt. Gen. Sir Michael Rose, went to the Serb stronghold of Pale, near Sarajevo, for talks with Bosnian Serb leader Radovan Karadzic.An easy but successful HMM application:•Prior to 1997 - no learning approach competitive with hand-built rule systems •Since 1997 - Statistical approaches (BBN (Bikelet al. 1997), NYU, MITRE, CMU/JustSystems) achieve state-of-the-art performance14Applying HMMs to IEDocument ⇒ generated by a stochastic process modelled by an HMMToken ⇒ wordState ⇒ “reason/explanation” for a given token‘Background’ state emits tokens like ‘the’, ‘said’, …‘Money’ state emits tokens like ‘million’, ‘euro’, …‘Organization’ state emits tokens like ‘university’, ‘company’, …Extraction: via the Viterbi algorithm, a dynamic programming technique for efficiently computing the most likely sequence of states that generated a document.15HMM formalismHMM = states s1, s2, …(special start state s1special end state sn)token alphabet a1, a2, …state transition probs P(si|sj)token emission probs P(ai|sj)Widely used in many
View Full Document