Empirical Methods in Information ExtractionBy Claire CardiePresentation by Dusty SargentBackground Domain-specific task differs from more general problems studied so far Summarizes important points in a text with respect to a target topic Structures information for storage into databaseBackground (cont’d) MUC (Message Understanding Conference) evaluates systems Provides answer keys and texts for particular topic Recall = (# correct slot fillers in output template) / (# of slot-fillers in answer key) Precision = (# correct slot fillers in output template) / (# of slot-fillers in output template) Has been used in practical applicationsApplications Summarize medical records (test results, diagnoses, symptoms, etc.) Extract information about terrorist activities from radio or television broadcasts Keep records of corporate mergers and acquisitions Build knowledge bases from information found in websites Create job listings from web-based classified ads, job-search sites and newsgroupsPerformance State of the art systems reach 50% recall and 70% precision on complicated extraction problems Can reach 90% precision and recall on the easiest extraction tasks Human error rate also high for information extraction Best systems have only twice error rate of human experts trained for same task Still a lot of room for improvement Time consuming development phase and cause of errors difficult to determineArchitecture Traditional NLP approach with full syntactic and semantic analysis of input text Less common simple approach with keyword matching and little linguistic analysisArchitecture (cont’d) Tagging and tokenization: divide input into sentences and words, part-of-speech tag and disambiguation word senses Sentence analysis: partial parse and tag with respect to semantic roles Extraction: identify relevant entities and relations between them, specific to the domain Merging: coreference resolution between extracted entities and events Template generation: map extracted information into domain specific output formatCorpus-based Learning Used for the underlying tasks of information extraction Can apply to preliminary stages of the architecture Difficulty in finding enough training data for all the levels of analysis required Expensive to retrain the system for each domain to which it must be applied Standard NLP learning techniques difficult to apply to later stages: learning extraction patterns, coreferenceresolution, template generation New training corpus needed for each task; difficult to learn general patterns from answer keysLearning Extraction Patterns Use general pattern matching techniques for extraction phase Acquire good extraction patterns from training corpus with empirical methods Similar to Candidate Elimination Algorithm Extraction patterns ordered from general to specific, need balance between the two Need general patterns to apply to more than one case Patterns must be specific enough that they do not apply in the wrong contextAutoSlog One of earliest systems for learning extraction patters, by Lehnert and Riloff (1992 –1993) Learns “concept nodes”, domain-specific semantic frames, maximum of one slot per frame Concept nodes used with CIRCUS parser for the final extraction taskConcept Node Definition Concept: the concept to be extracted, e.g. Damaged-Object Trigger: word that activates pattern Position: syntactic position where the concept is likely to be found in the sentence Constraints: constraints on argument at “Position”necessary for extraction to occur; can be hard or soft Enabling Conditions: constraints on linguistic context of trigger wordExample Application Example: “...the twister occurred at approximately 7:15pm and destroyed two mobile homes.” Concept is Damaged-Object Concept node is activated by trigger word “destroyed” Enabling Condition: “destroyed” occurs in active voice Position: direct-object of verb “destroyed” Constraints: direct-object of “destroyed” must be a physical object Result: “two mobile homes” is extracted to fill the Damaged-Object slot of the concept nodeConcept Node Algorithm Concept nodes applied during partial parsing phase of the extraction system When trigger word encountered, check for enabling conditions If met, extract phrase in appropriate position Test phrase for constraints If constraints met, label phrase as instance of the concept typeLearning Concept Nodes Learning algorithm specific to domain Requires training text with noun phrases annotated with concept type, or uses answer keys Uses partial parse and small set of linguistic patterns to help learn concept nodes New version, AutoSlog-TS, only needs to be given texts marked as relevant or irrelevant to the domain of the extraction taskLearning Algorithm Find sentence in which target noun phrase occurs in training data Parse the sentence with partial parser Apply the list of linguistic patterns in order If a pattern linguistic pattern applies to the sentence, create a concept node definition from the appropriate elements of the sentenceLearning Example “Witnesses confirm that the twister occurred without warning at approximately 7:15pm and destroyed two mobile homes(Damaged_Object)”. Target noun phrase is “two mobile homes”, marked in training corpus as an instance of the concept Damaged_Object, or found in the Damaged_Object field in the answer key Step 1: find the above sentence in the training corpus, in which the target noun phrase occurs Step 2: parser determines that “two mobile homes” was the direct object of active verb “destroyed” in the third clause Step 3: match third clause to the following linguistic pattern: <active-voice-verb> followed by <target-np> = <direct-object> Step 4: generate the concept node seen previously from matched constituents, context, concept type, and semantic classAutoSlog-TS Improved version needs only relevant and irrelevant texts as training data Adapts AutoSlog to use statistical techniques Nearly matches performance of AutoSlog on MUC 4 extraction task, using a fraction of the human effort Scans corpus once and generates an extraction pattern for every noun phrase Scans again and ranks extraction patterns according to some ranking functionPALKA Learns
View Full Document