Unformatted text preview:

CS 388: Natural Language Processing: Information ExtractionInformation Extraction (IE)Sample Job PostingExtracted Job TemplateNamed Entity RecognitionNamed Entity Recognition ExampleSlide 7Relation ExtractionEarly Information ExtractionMUCOther ApplicationsSlide 12Slide 13Slide 14Web ExtractionAmazon Book DescriptionExtracted Book TemplateTemplate TypesIE as Sequence LabelingPattern-Matching Rule ExtractionRegular ExpressionsRegular Expression ExamplesEnhanced Regex’s (Perl)Perl Regex’sPerl Regex ExamplesSimple Extraction PatternsAdding NLP Information to PatternsPattern-Match Rule LearningRAPIER Pattern Induction ExampleEvaluating IE AccuracyIE Experiment in BioinformaticsNon-Learning Protein ExtractorsLearning Methods for Protein ExtractionBiomedical CorporaExperimental MethodProtein Name Extraction Results AIMed CorpusProtein Name Extraction Results Yapex CorpusSlide 38ELCS (Extraction using Longest Common Subsequences)Generalizing Rules using Longest Common SubsequenceProtein Interaction CorpusProtein Interaction Extraction Results (gold-standard protein tags)Protein Interaction Extraction Results (automated protein tags)ERK: Relation Extraction using a String Subsequence KernelSlide 45ACE 2002 Newspaper CorpusSlide 47Text MiningActive LearningUncertainty SamplingRapier Uncertainty Sampling ResultsInformation Extraction Issues111CS 388: Natural Language Processing:Information ExtractionRaymond J. MooneyUniversity of Texas at Austin2Information Extraction (IE)•Identify specific pieces of information (data) in a unstructured or semi-structured textual document.•Transform unstructured information in a corpus of documents or web pages into a structured database.•Applied to different types of text:–Newspaper articles–Web pages–Scientific articles–Newsgroup messages–Classified ads–Medical notes3Subject: US-TN-SOFTWARE PROGRAMMERDate: 17 Nov 1996 17:37:29 GMTOrganization: Reference.Com Posting ServiceMessage-ID: <[email protected]>SOFTWARE PROGRAMMERPosition available for Software Programmer experienced in generating software for PC-Based Voice Mail systems. Experienced in C Programming. Must be familiar with communicating with and controlling voice cards; preferable Dialogic, however, experience with others such as Rhetorix and Natural Microsystems is okay. Prefer 5 years or more experience with PC Based Voice Mail, but will consider as little as 2 years. Need to find a Senior level person who can come on board and pick up code with very little training. Present Operating System is DOS. May go to OS-2 or UNIX in future.Please reply to:Kim AndersonAdNET(901) 458-2888 [email protected]: US-TN-SOFTWARE PROGRAMMERDate: 17 Nov 1996 17:37:29 GMTOrganization: Reference.Com Posting ServiceMessage-ID: <[email protected]>SOFTWARE PROGRAMMERPosition available for Software Programmer experienced in generating software for PC-Based Voice Mail systems. Experienced in C Programming. Must be familiar with communicating with and controlling voice cards; preferable Dialogic, however, experience with others such as Rhetorix and Natural Microsystems is okay. Prefer 5 years or more experience with PC Based Voice Mail, but will consider as little as 2 years. Need to find a Senior level person who can come on board and pick up code with very little training. Present Operating System is DOS. May go to OS-2 or UNIX in future.Please reply to:Kim AndersonAdNET(901) 458-2888 [email protected] Job Posting4Extracted Job Templatecomputer_science_jobid: [email protected]: SOFTWARE PROGRAMMERsalary:company:recruiter:state: TNcity:country: USlanguage: Cplatform: PC \ DOS \ OS-2 \ UNIXapplication:area: Voice Mailreq_years_experience: 2desired_years_experience: 5req_degree:desired_degree:post_date: 17 Nov 19965Named Entity Recognition•Specific type of information extraction in which the goal is to extract formal names of particular types of entities such as people, places, organizations, etc.•Usually a preprocessing step for subsequent task-specific IE, or other tasks such as question answering.6Named Entity Recognition Example U.S. Supreme Court quashes 'illegal' Guantanamo trialsMilitary trials arranged by the Bush administration for detainees at Guantanamo Bay are illegal, the United States Supreme Court ruled Thursday. The court found that the trials — known as military commissions — for people detained on suspicion of terrorist activity abroad do not conform to any act of Congress. The justices also rejected the government's argument that the Geneva Conventions regarding prisoners of war do not apply to those held at Guantanamo Bay. Writing for the 5-3 majority, Justice Stephen Breyer said the White House had overstepped its powers under the U.S. Constitution. "Congress has not issued the executive a blank cheque," Breyer wrote.President George W. Bush said he takes the ruling very seriously and would find a way to both respect the court's findings and protect the American people.7Named Entity Recognition Example people places organizations U.S. Supreme Court quashes 'illegal' Guantanamo trialsMilitary trials arranged by the Bush administration for detainees at Guantanamo Bay are illegal, the United States Supreme Court ruled Thursday. The court found that the trials — known as military commissions — for people detained on suspicion of terrorist activity abroad do not conform to any act of Congress. The justices also rejected the government's argument that the Geneva Conventions regarding prisoners of war do not apply to those held at Guantanamo Bay. Writing for the 5-3 majority, Justice Stephen Breyer said the White House had overstepped its powers under the U.S. Constitution. "Congress has not issued the executive a blank cheque," Breyer wrote.President George W. Bush said he takes the ruling very seriously and would find a way to both respect the court's findings and protect the American people.8Relation Extraction•Once entities are recognized, identify specific relations between entities–Employed-by–Located-at–Part-of•Example:–Michael Dell is the CEO of Dell Computer Corporation and lives in Austin Texas.9Early Information Extraction•FRUMP (Dejong, 1979) was an early information extraction system that processed news stories and identified various types of events (e.g. earthquakes, terrorist attacks, floods).•Used “sketchy scripts” of various events to


View Full Document

UT CS 388 - Information Extraction

Download Information Extraction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Information Extraction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Information Extraction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?