DOC PREVIEW
Villanova CSC 9010 - Introduction to GATE

This preview shows page 1-2-3-27-28-29 out of 29 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 29 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CSC 9010: Text Mining Applications Fall, 2003 Introduction to GATEWhat is GATE?Who Use GATE?How GATE can Help?What are GATE Components?GATE as an architectureLRs: Corpora, Documents, and AnnotationsDocuments Processing in GATEBuilt-in GATE ComponentsDevelop Language Processing Functionality using GATECREOLEPRs: ANNIEANNIE IE ModulesANNIE ComponentsANNIE Component: TokenizerTokenizer RuleExample Tokenizer RuleANNIE Component: GazetteerExample Gazetteer ListANNIE Component: Semantic TaggerANNIE Component: Sentence SplitterANNIE Component: OrthoMatcherCreate a New ResourceExample: Create a New Component Called GoldFishExample: Create GoldFish Using BootStrap WizardGoldFish: default files createdCreate an Application with PRsAdditional FacilitiesEmbedding ANNIE©2003 Paula MatuszekTaken primarily from a presentation by Lin Lin. http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.pptCSC 9010: Text Mining ApplicationsFall, 2003Introduction to GATEDr. Paula [email protected] primarily from a presentation by Lin Lin http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.ppt©2003 Paula MatuszekTaken primarily from a presentation by Lin Lin. http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.pptWhat is GATE?Stands for General Architecture for Text Engineering.The theory behind GATE is SALE (Software Architecture for Language Engineering):–computer processing of human language–computer infrastructure for software development©2003 Paula MatuszekTaken primarily from a presentation by Lin Lin. http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.pptWho Use GATE?Scientists performing experiments that involve processing human languageDevelopers developing applications with language processing componentsTeachers and students of courses about language and language computation©2003 Paula MatuszekTaken primarily from a presentation by Lin Lin. http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.pptHow GATE can Help?Specify an architecture, or organizational structure, for language processing softwareProvide a framework, or class library, that implements the architecture and can be used to embed language processing capabilities in diverse applicationsProvide a development environment built on top of the framework made up of convenient graphical tools for developing components©2003 Paula MatuszekTaken primarily from a presentation by Lin Lin. http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.pptWhat are GATE Components?Reusable software chunks with well defined interfacesUsed in Java beans and Microsoft’s .Net©2003 Paula MatuszekTaken primarily from a presentation by Lin Lin. http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.pptGATE as an architectureBreaks down to three types of components:–LanguageResources (LRs) –represent entities such as lexicons, documents, corpora, annotation schemas, or ontologies;–ProcessingResources (PRs) –represent entities that are primarily algorithmic, such as parsers, generators or ngram modelers;–VisualResources (VRs) –represent visualization and editing components that participate in GUIs.©2003 Paula MatuszekTaken primarily from a presentation by Lin Lin. http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.pptLRs: Corpora, Documents, and AnnotationsA Corpus in Gate is a Java Set whose members are Documents.Documents are modeled as content plus annotations plus features.Annotations are organized in graphs, which are modeled as Java sets of Annotation.©2003 Paula MatuszekTaken primarily from a presentation by Lin Lin. http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.pptDocuments Processing in GATEDocument:–Formats including XML, RTF, email, HTML, SGML, and plain text.–Identified and converted into GATE annotation format.–Processed by PRs.–Results stored in a serial data store (based on Java serialization) or as XML.©2003 Paula MatuszekTaken primarily from a presentation by Lin Lin. http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.pptBuilt-in GATE ComponentsResources for common LE data structures and algorithms, including documents, corpora and various annotation typesA set of language analysis components for Information Extraction (e.g. ANNIE)A range of data visualization and editing components©2003 Paula MatuszekTaken primarily from a presentation by Lin Lin. http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.pptDevelop Language Processing Functionality using GATEProgramming, or the development of Language Resources such as grammars that are used by existing Processing Resources, or a mixture of both.The development environment is used for:–visualization of the data structures produced and consumed during processing–debugging–performance measurement©2003 Paula MatuszekTaken primarily from a presentation by Lin Lin. http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.pptCREOLEA Collection of REusable Objects for Language EngineeringThe set of resources integrated with GATEAll the resources are packaged as Java Archive (or ‘JAR’) files, plus some XML configuration data.©2003 Paula MatuszekTaken primarily from a presentation by Lin Lin. http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.pptPRs: ANNIEA family of Processing Resources for language analysis included with GATEStands for A Nearly-New Information Extraction system.Using finite state techniques to implement various tasks: tokenization, semantic tagging, verb phrase chunking, and so on.©2003 Paula MatuszekTaken primarily from a presentation by Lin Lin. http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.pptANNIE IE Modules©2003 Paula MatuszekTaken primarily from a presentation by Lin Lin. http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.pptANNIE ComponentsTokenizerGazetteerSentence SplitterPart of Speech Tagger–produces a part-of-speech tag as an annotation on each word or symbol.Semantic TaggerOrthoMatcher Coreference Module©2003 Paula MatuszekTaken primarily from a presentation by Lin Lin. http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.pptANNIE Component: TokenizerToken Types–word, number, symbol, punctuation, and spaceToken.A tokenizer rule has a left hand side and a right hand side.©2003 Paula MatuszekTaken primarily from a presentation by Lin Lin. http://webster.cs.uga.edu/~lin/GlobalInfoSys/GATE.pptTokenizer RuleOperations used on the LHS:– | (or) – * (0 or more occurrences) –


View Full Document

Villanova CSC 9010 - Introduction to GATE

Documents in this Course
Lecture 2

Lecture 2

48 pages

Lecture 2

Lecture 2

46 pages

Load more
Download Introduction to GATE
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction to GATE and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to GATE 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?