CSC 9010: AeroText, Ontologies, AeroDAMLAeroTextAeroText DemoOntologiesA Simple Ontology: BirthdatesWho and Why?DAMLUBOTAeroDAMLLab: try out AeroDAML©2003 Paula MatuszekCSC 9010: AeroText, Ontologies, AeroDAMLDr. Paula [email protected](610) 270-6851©2003 Paula MatuszekAeroTextInformation Extraction tool marketed by Lockheed MartinCapabilities similar to GATEMuch better developed IDELess open to extensions of the system itself.Equally steep learning curve for effective use!Lockheed AeroText General OverviewLockheed AeroText White Paper©2003 Paula MatuszekAeroText Demo©2003 Paula MatuszekOntologiesInformation Extraction requires modeling extensive domain knowledgeOther applications of text mining, such as document categorization, can also use domain informationIn modeling such knowledge we often create an ontology: An explicit formal specification of how to represent the objects, concepts, and other entities that are assumed to exist in some area of interest and the relationships that hold among them.©2003 Paula MatuszekA Simple Ontology: BirthdatesObjects, concepts, entities:–Months, days, years–dates–first names–last names–persons–birthdatesRelationships between them–a date has exactly one month, day, year–a birthdate is a date–a person has at least 1 first name and exactly 1 last name–a person has a birthdate–a birthdate has a person©2003 Paula MatuszekWho and Why?Many groups are developing ontologies:–standardize terms and vocabulary–facilitate the semantic web–improve information integration–interested in the domain itselfSome ontologies under development–Cyc–GO (Gene ontology)–UMLS (Unified Medical Language System)–CIA World Factbook©2003 Paula MatuszekDAMLDARPA Agent Markup LanguageA language for describing ontologiesExample: an ontology for datesExtensive information available at www.daml.org.©2003 Paula MatuszekUBOTUML Based Ontology ToolkitPart of a DARPA project to automatically mark up web pages to make themThe purpose of DAML is to annotate information on the web to make it machine-readable so that software agents can interpret it and reason with it: the semantic webhttp://ubot.lockheedmartin.com/ubot/intro/index.html©2003 Paula MatuszekAeroDAMLAeroDAML is a web service that takes a web page as an input and generates DAML markup.Uses AeroText as the underlying extraction tool.Works with various ontologies.Paper describing system©2003 Paula MatuszekLab: try out AeroDAMLAeroDAML page•Choose a news page (www.phillynews.com, Google News, ...) and tag it with the Cyc and CIA ontologies.•How well did each ontology do at picking up content? Did they miss things they should have found? Was anything tagged incorrectly?•Repeat for one of your domain-specific documents, or a web page in a specific area. Try a different ontology if you think one of the others might be more interesting.•How was the annotation different? •Are we enabling the semantic
View Full Document