Unformatted text preview:

CSC 9010: Text Mining Summarization LabSummarization ToolsSlide 3Text Analyst (Extracted primarily from TextAnalyst 2.1 Help System)DefinitionsMore DefinitionsSemantic NetworkNeural NetTextAnalyst AlgorithmsCapabilitiesCapabilities 2LAB:©2003 Paula MatuszekCSC 9010: Text MiningSummarization LabDr. Paula [email protected](610) 270-6851©2003 Paula MatuszekSummarization ToolsWe will try out two summarization tools–Mead(tangra.si.umich.edu/clair/meaddemo/demo.cgi)–Text summarization project under development at the University of Michigan. There is an online demo that we will be using. –TextAnalyst(www.megaputer.com/products/ta/index.php3)–Application marketed by Megaputer, which is a data-mining/text mining company. An evaluation copy is installed on our lab machines. We will spend most of our time here.©2003 Paula MatuszekMEAD©2003 Paula MatuszekText Analyst (Extracted primarily from TextAnalyst 2.1 Help System)TextAnalyst is a natural language text analysis software tool which provides a number of capabilities.–document summarization–topic structure extraction–document navigation–"natural language search"The basis of the tool is a network of terms found in a document and the relations between them.–Words which occur often together are considered relatedhttp://www.megaputer.com/tech/wp/tm.php3 for additional information.©2003 Paula MatuszekDefinitionsConcept – Refers to a word or words (term or terms) TextAnalyst identifies as significant in your text. Concepts appear as hyperlinks in text and as list items in tree structures.Text – Refers to a document you load in TextAnalyst. Both .TXT and RTF file formats are acceptable.Semantic network – A tree structure of concepts from your text and the relationships between them. This is a concise representation of your text.Knowledge base – The collection of your text, the semantic network related to your text, any edits you made, the results of your analyses, and hyperlinks within your text.Information from TextAnalyst Help system©2003 Paula MatuszekMore DefinitionsSemantic search – Semantic search is synonymous with Natural Language Query. You type a question in conventional, common English, and TextAnalyst returns results for your examination.Semantic weight – The semantic weight of a concept is a measure of its importance in your document. This is the number closest to a concept in a tree structure when measuring semantic weight. The semantic weight of the relationship between a concept and its parent concept is the leftmost number in a pair when measuring semantic weight. This number shows the measure of the strength of the relationship between the concept and its parent.Information from TextAnalyst Help system©2003 Paula MatuszekSemantic NetworkConcepts and relationships among themCommon Natural Language Representation: Tom gave Mary a rose.Give verb:gaveTo mRoseMaryGiverRecipientGift©2003 Paula MatuszekNeural NetMachine Learning AlgorithmsInput Layer of nodesOutput Layer of nodesZero or more hidden layersWeighted Links among nodesLearning methods–back propagation–others©2003 Paula MatuszekTextAnalyst AlgorithmsPreprocessing (language-specific)–Eliminate stop words–StemStatistical Analysis–proprietary neural network algorithm–word frequencies–word combination frequencies–joint occurrence of words within sentences–yields network of term strengths and relation strengths©2003 Paula MatuszekCapabilitiesSemantic Analysis: tree structure of concepts and relationsNavigation: concepts in tree are linked to occurrences in textSummarization: identify "most important" sentencesNatural Language query: query in EnglishInformation from TextAnalyst Help system©2003 Paula MatuszekCapabilities 2Knowledge base development: maintain semantic network, links, related dictionaries, etcTopic Structure ViewCluster View (especially for multuple documents)Dictionary development: add, delete terms from automated dictionaryFocused analysis: narrow terms in searchInformation from TextAnalyst Help system©2003 Paula MatuszekLAB:Using the set of documents from assignment 2, create summaries using both MEAD and TextAnalyst.Explore some of the different parameters, including using multiple documents.How well did each tool do with single documents? Multiple documents?Do you think your documents could have been well-summarized by extracting


View Full Document

Villanova CSC 9010 - Summarization Lab

Documents in this Course
Lecture 2

Lecture 2

48 pages

Lecture 2

Lecture 2

46 pages

Load more
Download Summarization Lab
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Summarization Lab and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Summarization Lab 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?