Unformatted text preview:

Visual Data MiningChidroop MadhavarapuCSE 591:Visual AnalyticsMotivationVisualization for Data Mining• Huge amounts of information• Limited display capacity of output devicesVisual Data Mining (VDM) is a new approach forexploring very large data sets, combining traditionalmining methods and information visualization techniques.Why Visual Data MiningIntegration of visualization and data mining :Visual Data Mining approaches fall under 3 categories: Data Mining process visualization. Data Mining result visualization. Interactive Visual Data Mining.Data Mining process visualizationVisualization techniques are used to support Data Mining.Ex: When required to handle large amount of multidimensional data in the format of Data Tables or relational databases. ( Parallel Coordinates, scatter plots etc.)Data Mining result visualizationTo visually convey the results of Mining tasks, such as clustering or classification, to enhance user interpretation.Examples include Scatter plots, Box plots, BLOB and H-BLOB clustering algorithms, Decision trees, Association Rules,Interactive Visual Data Mining Rather than using Visual Data exploration and analytical mining algorithms as separate tools, a stronger DM strategy would be to tightly couple the visualizations and analytical processes into one DM tool. Using visualization tools in the data mining process to help users make smart data mining decisions. Examples include the Control project, OptiGrid, PBC (Perception Based Classification).V-Miner: Using Enhanced Parallel Coordinates to Mine Product Design and Test DataKaii Zhao, Bing Liu, Thomas Tirpak, Andreas SchallerUniversity of Illinois, Chicago Motorola LabsINTRODUCTION V-Miner : Multivariable visualization tool. Designed for Mining product design and test data. New technique based on Parallel coordinate visualization. Goal is to discover useful knowledge from mobile phone testing data that can be used to provide feedback to the design engineers.Design Process for consumer electronics. Engineers design specific sections of phone based on previous successful designs, new product specs, design simulations etc. Prototypes are built Functional tests are performed on prototypes. If the requirements are not met, start with next design cycle (from step 1).Above steps are repeated until design meets the specification. Then the phone is released to the NPI team for volume manufacturing. For a new product, number of iterations of design revisions should be coordinated. 100’s of variables involved which are changed/tested in the different revisions. V-Miner is used to reduce engineering costs, design defects by mining useful knowledge from the test data .THE DATAAfter each design change, all test variables are measured. Each variable takes numerical values and has the following properties: Has an upper limit and lower limit. If a value does not fall in this range, its unacceptable. Has an ideal value called the target value.SAMPLE TEST DATAEach change is a new design. Data is a sequential set. Subsequent changes are based on earlier changes.With the testing data, designers are interested in : Significant changes in variables with design change. Cause of these changes. Stable variables whose values are not affected by design changes.Using Traditional Mining algorithms is not adequate here because, Due to large number of variables, association rule mining generates too many rules. Decision trees does not find all interesting patterns, but only subset of the patterns.To solve the problem, we can use parallel coordinates which give an intuitive view to the underlying data. Parallel Coordinates OverviewProblem with the traditional parallel coordinates technique Does not consider the sequence in which the data was generated. Sol: Add a sequence component to the traditional parallel coordinate visualization. -- Add trend figures. Does not consider the ordering of the variables. Sol: A querying and sorting tool is implemented to enable users to issue queries and rearrange the axes accordingly. So, design an Enhanced Parallel Coordinates system.TREND FIGURESOrder of data records is of high significance, as it might reveal sequence dependent relations.Extend the existing system by adding a additional graph for each variable above its coordinate.Thus it is possible to quickly see variables that change in similar ways by comparing the trend figures.TREND FIGURESQUERYING AND SORTINGAllows user to query shapes based on approximate pattern matching. Two main types of pattern: Value change pattern & Failure pattern.Value change pattern indicates how a variable’s value changes over different design changes. up :3 down: 1 stable: 2Example: 3312Failure pattern indicates if the value falls within the upper an lower limit after the design modification.F: failure O: ok Example: OOOFFString comparison is more convenient and intuitive for human users.Ordering of the variables in parallel coordinate visualization is done according to the comparison results.Need for Data MiningGoal for the application is to enable engineers at Motorola to identify the following: Variables that show prominent changes in their values after some design changes. Stable variables that aren't affected by the design changes. Failure patterns of variables that failed after certain design changes. Variables that have similar value change patterns.DATA NORMALIZATIONVariables whose values are out of range are normalized to either larger than 1 or less than -1. Normalized values close to 0 are the ones close to the target values.Procedure normalization (value, min, max, target)// return value stores in: normalized_valueif ((value >= min) && (value <= max)) thennormalized_value = (value - target) / (max - min);elseif (value > max) thennormalized_value = (value - target) / (max - min) +1;else // value < minnormalized_value = (value - target) / (max - min) -1;end-ifend-ifKEY FEATURES Data in different designs are visualized using different colors. For each variable, a trend figure is drawn on the top of the screenUser can identify significant characteristics from visualization . User can easily identify which variables are out of range or within the range (ex 19, 20). Variables that behave similarly from the trend figures (ex 33, 34). Some variables have stable values over all design changes (ex 15).In classical parallel coordinate


View Full Document
Download Visual Data Mining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Visual Data Mining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Visual Data Mining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?