GT CS 7450 - Text and Document Visualization 1

Unformatted text preview:

1 Text and Document Visualization 1 CS 7450 - Information Visualization November 11, 2013 John Stasko Topic Notes Text is Everywhere • We use documents as primary information artifact in our lives • Our access to documents has grown tremendously in recent years due to networking infrastructure  WWW  Digital libraries  ... Fall 2013 2 CS 74502 Big Question • What can information visualization provide to help users in understanding and gathering information from text and document collections? Fall 2013 3 CS 7450 Tasks/Goals • What kinds of analysis questions might a person ask about text & documents? Fall 2013 CS 7450 43 Example Tasks & Goals • Which documents contain text on topic XYZ? • Which documents are of interest to me? • Are there other documents that are similar to this one (so they are worthwhile)? • How are different words used in a document or a document collection? • What are the main themes and ideas in a document or a collection? • Which documents have an angry tone? • How are certain words or themes distributed through a document? • Identify “hidden” messages or stories in this document collection. • How does one set of documents differ from another set? • Quickly gain an understanding of a document or collection in order to subsequently do XYZ. • Understand the history of changes in a document. • Find connections between documents. Fall 2013 CS 7450 5 Related Topic - IR • Information Retrieval  Active search process that brings back particular/specific items (will discuss that some today, but not always focus)  I think InfoVis and HCI can help some… • InfoVis, conversely, seems to be most useful when  Perhaps not sure precisely what you’re looking for  More of a browsing task than a search one Fall 2013 6 CS 74504 Related Topic - Sensemaking • Sensemaking  Gaining a better understanding of the facts at hand in order to take some next steps  (Better definitions in VA lecture) • InfoVis can help make a large document collection more understandable more rapidly Fall 2013 7 CS 7450 Challenge • Text is nominal data  Does not seem to map to geometric/graphical presentation as easily as ordinal and quantitative data • The “Raw data --> Data Table” mapping now becomes more important Fall 2013 8 CS 74505 This Week’s Agenda Fall 2013 CS 7450 9 Visualization for IR Helping search Visualizing text Showing words, phrases, and sentences Visualizing document sets Words, entities & sentences Analysis metrics Concepts & themes Information Retrieval • Can InfoVis help IR? • Assume there is some active search or query  Show results visually  Show how query terms relate to results  … Fall 2013 10 CS 74506 Improving Text Searches • What’s wrong with the common search?  Is there really anything wrong? • Visualizing the results of search queries is one potential important area of text infovis Fall 2013 11 CS 7450 What Hearst Thinks is Wrong • Query responses do not include include:  How strong the match is  How frequent each term is  How each term is distributed in the document  Overlap between terms  Length of document • Document ranking is opaque • Inability to compare between results • Input limits term relationships Fall 2013 12 CS 7450 Hearst CHI ‘957 TileBars • Goal  Minimize time and effort for deciding which documents to view in detail • Idea  Show the role of the query terms in the retrieved documents, making use of document structure Fall 2013 13 CS 7450 TileBars • Graphical representation of term distribution and overlap • Simultaneously indicate:  Relative document length  Frequency of term sets in document  Distribution of term sets with respect to the document and each other Fall 2013 14 CS 74508 Interface Search terms Presentation Fall 2013 15 CS 7450 Technique Relative length of document Two search terms Blocks indicate “chunks” of text, such as paragraphs Blocks are darkened according to the frequency of the term in the document Fall 2013 16 CS 7450 Video9 Issues • Horizontal alignment doesn’t match mental model • May not be the best solution for web searches  Non-linear material  Images? Apps? • Anything else? Fall 2013 17 CS 7450 Generalize More • How about the “holy grail” of a visual search engine?  Hot idea for a while • My personal view: It’s a mistake in the general case. Text is just better for this. Fall 2013 18 CS 745010 Search Visualization http://www.kartoo.com Fall 2013 19 CS 7450 Defunct Sparkler • Abstract result documents more • Show “distance” from query in order to give user better feel for quality of match(es) • Also shows documents in responses to multiple queries Havre et al InfoVis ‘01 Fall 2013 20 CS 745011 Visualizing One Query • Triangle – query • Square – document • Distance between query and documents represents their relevance Fall 2013 21 CS 7450 Visualizing Multiple Queries Six queries here Bullseye allows viewer to select quality results Fall 2013 22 CS 745012 Test Example • Text Retrieval Conference (TREC-3) test document collection • AP news stories from June 24–30, 1990 • TREC topic: Japan Protectionist Measures • Sparkler found 16 of 17 relevant documents Fall 2013 23 CS 7450 Another Idea Use it to compare search results from different search engines Fall 2013 24 CS 745013 RankSpiral Spoerri InfoVis ’04 poster Color represents different search engines Fall 2013 25 CS 7450 ResultMaps Fall 2013 CS 7450 26 Treemap-style vis for showing query results in a digital library Clarkson, Desai & Foley TVCG (InfoVis) ‘0914 To Learn More Fall 2013 CS 7450 27 Marti Hearst’s Book Chapter 10 http://searchuserinterfaces.com/book/ Transition 1 • OK, let’s move up beyond just search/IR • How do we represent the words, phrases, and sentences in a document or set of documents?  Main goal of understanding versus search Fall 2013 28 CS 745015 One Text Visualization Uses: Layout Font Style Color … Fall 2013 29 CS 7450 Word Counts Fall 2013 CS 7450 30 http://www.nytimes.com/interactive/2012/08/28/us/politics/convention-word-counts.html16 More Word Counting Fall 2013 CS 7450 31 http://www.wordcount.org Tag/Word Clouds • Currently very “hot” in research community • Have proven to be very popular on web • Idea is to show word/concept importance through visual means


View Full Document

GT CS 7450 - Text and Document Visualization 1

Documents in this Course
Animation

Animation

23 pages

Load more
Download Text and Document Visualization 1
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Text and Document Visualization 1 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Text and Document Visualization 1 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?