Slide 1Information Extraction (IE) - TaskNamed Entity TaggerIE for Template Filling Relation DetectionIE for Question AnsweringApproachesApproach for NERSupervised Approach for relation detectionPattern MatchingSemi-supervised approach AutoSlog-TS (Riloff 1996)Slide 11Task 12: (DARPA – GALE year2) Produce a biography of [person]Biography – two approachesPattern Matching for Relation DetectionInformation ExtractionCS4705Idea: ‘extract’ or tag particular types of information from arbitrary text or transcribed speechInformation Extraction (IE) - TaskIdentify types and boundaries of named entity◦For example:Alexander Mackenzie , (January 28, 1822 ‐ April 17, 1892), a building contractor and writer, was the second Prime Minister of Canada from ….-> <PERSON>Alexander Mackenzie</PERSON> , (<TIMEX >January 28, 1822 <TIMEX> ‐ <TIMEX>April 17, 1892</TIMEX>), a building contractor and writer, was the second Prime Minister of <GPE>Canada</GPE> from ….Named Entity TaggerIE for Template FillingRelation DetectionGiven a set of documents and a domain of interest, fill a table of required fields.• For example:Number of car accidents per vehicle type and number of casualties in the accidents.Q: When was Gandhi born?A: October 2, 1869Q: Where was Bill Clinton educated?A: Georgetown University in Washington, D.C.Q: What was the education of Yassir Arafat?A: Civil EngineeringQ: What is the religion of Noam Chomsky?A: JewishIE for Question AnsweringStatistical sequence labelingSupervisedSemi-supervised and bootstrappingApproaches<PERSON>Alexander Mackenzie</PERSON> , (<TIMEX >January 28, 1822 <TIMEX> ‐ <TIMEX>April 17, 1892</TIMEX>), a building contractor and writer, was the second Prime Minister of <GPE>Canada</GPE> from ….Statistical sequence labeling techniques can be used – similar to POS tagging◦Word-by-word sequence labeling◦Example of featuresPOS tagsSyntactic constituentsShape featuresPresence in a named entity listApproach for NERGiven a corpus of annotated relations between entities, train two classifiers:◦A binary classifierGiven a span of text and two entities -> decide if there is a relationship between these two entitiesFeatures◦Types of two named entities◦Bag of words◦POS of words in betweenExample:◦A rented SUV went out of control on Sunday, causing the death of seven people in Brooklyn◦Relation: Type = Accident, Vehicle Type = SUV, casualty = 7, weather = ?Pros and Cons?Supervised Approach for relation detectionHow can we come up with these patterns?Manually?◦Task and domain-specific◦Tedious, time consuming, not scalablePattern MatchingMUC-4 task: extract information about terrorist events in Latin AmericaTwo corpora:◦Domain-dependent corpus that contains relevant information◦A set of irrelevant documentsAlgorithm:1. Using heuristics, all patterns are extracted from both corpora. For example:Rule: <Subj> passive-verb<Subj> was murdered<Subj> was called2. Pattern Ranking: The output patterns are then ranked by the frequency of their occurrences in corpus1/corpus23. Filter out the patterns by handSemi-supervised approachAutoSlog-TS (Riloff 1996)1. Name(s), aliases:2. *Date of Birth or Current Age:3. *Date of Death:4. *Place of Birth:5. *Place of Death:6. Cause of Death:7. Religion (Affiliations):8. Known loca(ons and dates:9. Last known address:10. Previous domiciles:11. Ethnic or tribal affiliations:12. Immediate family members13. Na(ve Language spoken:14. Secondary Languages spoken:15. Physical Characteristics16. Passport number and country of issue:17. Professional positions:18. Education19. Party or other organization affiliations:20. Publica(ons (titles and dates):Task 12: (DARPA – GALE year2)Produce a biography of [person]To obtain high precision, we handle each slot independently using bootstrapping to learn IE patterns.To improve the recall, we utilize a biographical sentence classifierBiography – two approachesPatterns:◦“[CAR_TYPE] went out of control on [TIMEX], causing the death of [NUM] people”◦ “[PERSON] was born in [GPE]”◦ “[PERSON] was graduated from [FAC]”◦ “[PERSON] was killed by <X>” Matching Techniques◦Exact matchingPros and Cons?◦Flexible matching (e.g., [X] was .* killed .* by [Y])Pros and Cons?Pattern Matching for Relation
View Full Document