GT CS 7450 - Visualizing Big Data

Unformatted text preview:

Visualizing Big Data(Many Cases & Dimensions)CS 4460/7450 - Information VisualizationFeb. 3, 2009John StaskoSpring 2009 CS 4460/7450 2Last Time• We looked at parallel coordinates, one way of projecting >2 variables down onto the 2D planeSpring 2009 CS 4460/7450 3Techniques So FarSpring 2009 CS 4460/7450 4Potential Limitations• What happens when you have lots and lots of data cases?Spring 2009 CS 4460/7450 5Parallel CoordinatesOut5d dataset (5 dimensions, 16384 data items)(courtesy of J. Yang)Spring 2009 CS 4460/7450 6Potential Limitations• Or, you may have many, many variables− Hundreds or even thousandsStrategies• How are we going to deal with such big datasets with so many variables per case?• Ideas?Spring 2009 CS 4460/7450 7General Notion• Data that is similar in most dimensions ought to be drawn together− Cluster at high dimensions• Need to project the data down into the plane and give it some ultra-simplified representation• Or perhaps only look at certain aspects of the data at any one timeSpring 2009 CS 4460/7450 8Spring 2009 CS 4460/7450 9Mathematical Assistance 1• There exist many techniques for clustering high-dimensional data with respect to all those dimensions− Affinity propagation− k-means− Expectation maximization− Hierarchical clusteringSpring 2009 CS 4460/7450 10Mathematical Assistance 2• There exist many techniques for projecting n-dimensions down to 2-D(dimensionality reduction)− Multi-dimensional scaling (MDS)− Principal component analysis− Linear discriminant analysis− Factor analysisData miningKnowledge discoveryComput Sci & Eng coursesSpring 2009 CS 4460/7450 11Other Techniques• Other techniques exist to reduce data− Sampling – We only include every so many data cases or variables− Aggregation – We combine many data cases or variablesSpring 2009 CS 4460/7450 12Our Focus• Visual techniques• Many are simply graphic transformations from N-D down to 2-DExample• Big document collection• Accumulate all different words used throughout• Each word becomes a dimension• Value of that data case (document) in a dimension is the number of times the word appears in that document• (May be thousands of dimensions)Spring 2009 CS 4460/7450 13PNNL’s SPIRESpring 2009 CS 4460/7450 14Each dot is adocumentSimilarity provokesnearby positioningWill see more laterin term on Text dayWise et alInfoVis ‘95Pluses & Minuses• Can have as many cases as there are pixels and unlimited number of dimensions• Shows similarity of data cases• Only a dot for each case• Doesn’t say much about dimensions or casesSpring 2009 CS 4460/7450 15Spring 2009 CS 4460/7450 16Use?• What kinds of questions/tasks would you want such a technique to address?− Clusters of similar data cases− Useless dimensions− Dimensions similar to each other− Outlier data cases− …• Think back to our “cognitive tasks”discussionSpring 2009 CS 4460/7450 17Today• We’ll examine a number of other visual techniques intended for larger, high-dimensional data setsSpring 2009 CS 4460/7450 18Can We Make a Taxonomy?• D. Keim proposes a taxonomy of techniques− Standard 2D/3D displayBar charts, scatterplots− Geometrically transformed displayParallel coordinates− Iconic displayNeedle icons, Chernoff faces− Dense pixel displayWhat we’re about to see…− Stacked displayTreemaps, dimensional stackingTVCG ‘02Spring 2009 CS 4460/7450 19Dense Pixel Display• Represent data case or a variable as a pixel • Million or more per display• Seems to rely on use of color• Can pack lots in • Challenge: What’s the layout?Spring 2009 CS 4460/7450 20One RepresentationEach variable is in a windowData cases in grid in each windowSimilarity ofwindow viewstells you aboutsimilarity ofdimensionsUses color scaleSpring 2009 CS 4460/7450 21Alternative• Grouping arrangement• Doesn’t use multiple windows• Each data case has its own small rectangular icon• Plot out variables for data point in that icon using a grid layoutSpring 2009 CS 4460/7450 22Another ViewLevkowitzVis ‘91Spring 2009 CS 4460/7450 23Example Large ViewSpring 2009 CS 4460/7450 24DB Applications• Database of data items, each of n dimensions• Issue a query that specifies a target value of the dimensions• Often get back no exact matches• Want to find near matchesD. Keim, H-P Kriegel, “VisDB Database ExplorationUsing Multid Vis”, IEEE CG&A, 1994.Spring 2009 CS 4460/7450 25Relevance Factor• How close an item is to the query• Data items have some value that can benumerically quantified • Each dimension is some distance awayfrom query item• Sum these up for total distance• Relevance is inverse of distanceSpring 2009 CS 4460/7450 26Example• 5 dimensions, integers 0->255• Query: 6, 210, 73, 45, 92• Data item: 8, 200, 73, 50, 91• Distance: 2 + 10 + 0 + 5 + 1 = 18• Relevance: 1275 - 18 = 1267Spring 2009 CS 4460/7450 27Issues• What if dimensions are real numbers or text strings?• What if they’re the same type, but of different orders of magnitude?• Have to define some kind of distance, then a weight function to multiply bySpring 2009 CS 4460/7450 28Technique• Calculate relevance of all data points• Sort items based on relevance• Use spiral technique to order the values –Emanate out from center• Color items based on relevanceSpring 2009 CS 4460/7450 29Relevance ColorsHighLowEmpirically establishedSpring 2009 CS 4460/7450 30Technique021345610987Spring 2009 CS 4460/7450 31Spiral MethodHighest relevancevalue in center,decreasing valuesgrow outwardSpring 2009 CS 4460/7450 32Display MethodologyExample: five-dimensional dataTotalrelevanceDim 1 Dim 2Dim 3Dim 4Dim 5Spiralin eachwindowItems ordered by total relevanceSame itemappears insame placein each windowSpring 2009 CS 4460/7450 33Example DisplaySpring 2009 CS 4460/7450 34Alternative• Grouping arrangement• Doesn’t use multiple windows• Create all relevance dimensional depictions for an item and group them• Spiral out the different data items’depictionsSpring 2009 CS 4460/7450 35Grouping ArrangementSpring 2009 CS 4460/7450 36Example DisplayMulti-windowGrouping8 dimensions1000 itemsRelated Idea• Pixel Bar Chart• Overload typical bar chart with more information about individual elementsSpring 2009 CS 4460/7450 37Keim et alInformation Visualization ‘02Spring 2009 CS 4460/7450 38Idea 1Height encodes quantity Width encodes quantityIdea 2• Make each pixel


View Full Document

GT CS 7450 - Visualizing Big Data

Documents in this Course
Animation

Animation

23 pages

Load more
Download Visualizing Big Data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Visualizing Big Data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Visualizing Big Data 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?