GT CS 7450 - Text and Document Visualization

Unformatted text preview:

1Text and DocumentVisualizationCS 4460/7450 - Information VisualizationMarch 26, 2009John StaskoText is Everywhere• We use documents as primary information artifact in our lives• Our access to documents has grown tremendously in recent years due to networking infrastructure− WWW− Digital libraries− ...Spring 2009 2CS 4460/74502Big Question• What can information visualization provide to help users in understanding and gathering information from text and document collections?• Sidenote: We already encountered this topic a little bit earlier in the term when we discussed a) high-dimensional data and b) visual analyticsSpring 2009 3CS 4460/7450Example Tasks & Goals• Which documents contain text on topic XYZ?• Which documents are of interest to me?• Are there other documents that are similar to this one (so they are worthwhile)?• How are different words used in a document or a document collection?• What are the main themes and ideas in a document or a collection?• Which documents have an angry tone?• How are certain words or themes distributed through a document?• Identify “hidden” messages or stories in this document collection.• Quickly gain an understanding of a document or collection in order to subsequently do XYZ.• Find connections between documents.Spring 2009 4CS 4460/74503Related Topic - IR• Information Retrieval− Active search process that brings back particular/specific items (will discuss that some today, but not always focus)− I think InfoVis and HCI can help• InfoVis seems to help most when− Perhaps not sure precisely what you’re looking for− More of a browsing taskSpring 2009 5CS 4460/7450Related Topic - Sensemaking• Sensemaking − Gaining a better understanding of the facts at hand in order to take some next steps− (Better definitions in VA lecture)• InfoVis can help make a large document collection more understandable more rapidlySpring 2009 6CS 4460/74504Challenge• Text is nominal data− Does not seem to map to geometric/graphical presentation as easily as ordinal and quantitative data• The “Raw data --> Data Table” mapping now becomes more importantSpring 2009 7CS 4460/7450Today’s Agenda• Micro-level: More emphasis on individual words, actual document contents• Macro-level: Emphasis on large document collections, themes and concepts across collection, how documents relateFuzzy boundarySpring 2009 8CS 4460/74505One Text VisualizationUses:LayoutFontStyleColor…Spring 2009 9CS 4460/7450Tag Clouds• Currently very “hot” in research community• Have proven to be very popular on web• Idea is to show word/concept importance through visual means− Tags: User-specified metadata (descriptors) about something− Sometimes generalized to just reflect word frequenciesSpring 2009 10CS 4460/74506History• 90-year old Soviet Constructivism• Milgram’s ‘76 experiment to have people label landmarks in Paris• Flanagan’s ‘97 “Search referral Zeitgeist”• Fortune’s ‘01 Money Makes the World Go RoundViégas & Wattenberginteractions‘08Spring 2009 11CS 4460/7450Flickr Tag CloudSpring 2009 12CS 4460/74507delicious Tag CloudSpring 2009 13CS 4460/7450Alternate OrderSpring 2009 14CS 4460/74508Amazon’s Product ConcordanceMaybe now a“word cloud”Spring 2009 15CS 4460/7450SidenoteAlternatetext dataSpring 2009 16CS 4460/74509Many Eyes Tag CloudSpring 2009 17CS 4460/7450Problems• Actually not a great visualization. Why?− Hard to find a particular word− Long words get increased visual emphasis− Font sizes are hard to compare− Alphabetical ordering not ideal for many tasks• Studies have even shown they underperformGruen et alCHI ‘06Spring 2009 18CS 4460/745010Why So Popular?• Serve as social signifiers that provide a friendly atmosphere that provide a point of entry into a complex site• Act as individual and group mirrors• Fun, not business-likeHearst & RosnerHICSS ‘08Spring 2009 19CS 4460/7450Wordlehttp://www.wordle.netSpring 2009 20CS 4460/745011ConcordanceDefinitionSpring 2009 21CS 4460/7450Concordance in Texthttp://www.concordancesoftware.co.ukSpring 2009 22CS 4460/745012Many Eyes’ WordTreeSpring 2009 23CS 4460/7450Word Tree• Shows context of a word or words− Follow word with all the phrases that follow it• Font size shows frequency of appearance• Continue branch until hitting unique phrase• Clicking on phrase makes it the focus• Ordered alphabetically, by frequency, or by first appearanceWattenberg & Viégas TVCG‘08Spring 2009 24CS 4460/745013Another Word Viewhttp://www.nytimes.com/ref/washington/20070123_STATEOFUNION.html?initialWord=iraqSpring 2009 25CS 4460/7450Another Challenge• Visualize an entire book• What does that mean?• How about showing word appearances?Spring 2009 26CS 4460/745014TextArcBrad Paleyhttp://textarc.orgSentences laid outin order of appearanceWords near to where they appearMuch interactionSpring 2009 27CS 4460/7450Transition 1• OK, let’s move up a level from that word/text focusSpring 2009 28CS 4460/745015Information Retrieval• Can InfoVis help IR?• Assume there is some active search or query− Show results visually− Show how query terms relate to results− …Spring 2009 29CS 4460/7450Improving Text Searches• What’s wrong with the common search?• Visualizing the results of search operations is another big area in text infovisSpring 2009 30CS 4460/745016What Hearst Thinks is Wrong• Query responses do not include include:− How strong the match is− How frequent each term is− How each term is distributed in the document− Overlap between terms− Length of document• Document ranking is opaque• Inability to compare between results• Input limits term relationshipsSpring 2009 31CS 4460/7450TileBars• Goal− Minimize time and effort for deciding which documents to view in detail• Idea− Show the role of the query terms in the retrieved documents, making use of document structureHearstCHI ‘95Spring 2009 32CS 4460/745017TileBars• Graphical representation of term distribution and overlap• Simultaneously indicate:− Relative document length− Frequency of term sets in document− Distribution of term sets with respect to the document and each otherSpring 2009 33CS 4460/7450InterfaceSearch termsPresentationSpring 2009 34CS 4460/745018TechniqueRelative length of documentTwo searchtermsBlocks indicate “chunks” of text, such as paragraphsBlocks are darkened according to the


View Full Document

GT CS 7450 - Text and Document Visualization

Documents in this Course
Animation

Animation

23 pages

Load more
Download Text and Document Visualization
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Text and Document Visualization and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Text and Document Visualization 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?