U of M CSCI 8715 - Spatial Outlier Detection and implementation in Weka

Unformatted text preview:

Spatial Outlier Detection and implementation in Weka Implemented by: Shan Huang Jisu Oh CSCI8715 Class Project, April 27 2004 Presented by Jisu Oh (Group 2) Slides Available at http://www.users.cs.umn.edu/~joh/csci8715/HW-list.htmTopics:MotivationSlide 4Problem StatementSlide 6Key ConceptsKey Concepts (contd.)Slide 9Major ContributionsMajor Contributions (contd.)Slide 12Slide 13Slide 14Slide 15Slide 16Validation MethodologyAssumptionsSlide 19Future workThanks!1 Spatial Outlier Detection and implementation in Weka Implemented by: Shan HuangJisu OhCSCI8715 Class Project, April 27 2004Presented by Jisu Oh (Group 2)Slides Available at http://www.users.cs.umn.edu/~joh/csci8715/HW-list.htm2 Topics:MotivationProblem StatementKey ConceptsMajor ContributionsValidation MethodologyAssumptionsConclusionsFuture work3 MotivationMachine learning /Data mining-Enables a computer program to analyze large-scale data-Decide important information which can be used to make predictions or to make decisions faster and more accurately.4 MotivationWeka-A collection of machine learning algorithms for solving real-world data mining problems-Provides data mining functions (eg, regressions, association rules, and clustering algorithm)-Limitation: operates on traditional non-spatial database5 Problem StatementInput Data setMinneapolis/St. Paul traffic data set Output : detected outliers asPlain text (timeslot, time, station, Zs(x))Overall traffic volume Neighbor relationship graph between stations6 Problem Statement(cont.)ConstraintsAlgorithm from paper “A unified approach Detecting Spatial Outliers”Dataset should be numeric ObjectiveTo find sets of spatial outliers and show the results visually7 Key ConceptsSpatial outliersDefinition – spatially referenced objects whose non-spatial attribute values are significantly different from the values of its neighborhood.Example – a new house in an old neighborhood of a growing metropolitan areaIn this project, outlier is one station which has a high volume compared to the neighboring stations at certain time slot.8 Key Concepts (contd.)AlgorithmProposed in the paper, “A Unified Approach to Detecting Spatial Outliers”, by S. Shekhar, C. T. Lu, and P. ZhangS(x) = [f(x)-Ey∈ N(x)(f(y))] : difference between f(x) - attribute value of a sensor located at x Ey - average attribute value of x’s neighborsZs(x) = |s(x) –s/σs| > θ : spatial statistic, where θ is a z-score for user specified confidence interval9 Key Concepts (contd.)Algorithm (example)1 2 3 4 520 6 7 8 92 5 10 11 127 8 100 2 13 6 7 8 9s : 0.22σs : 23.8Zs(x) = |s(x) –s|/σs = 3.98Z-score for 95% C.I. = 2 3.98 > 2Thus, 100 is an outlier Outlier is replaced by Ey.100 -> 5S(x) = f(x) –Ey = 100 – (2+8)/2 = 95 1 2 3 4 520 6 7 8 92 5 10 11 127 8 5 2 13 6 7 8 910 Major ContributionsTop k outliers query processing User interface similar to an UI of WekaProviding visualization of outliers-plain text (time slot, time, station, Zs(x))-overall traffic volume-neighbor relationship graph between stations Keeping user-specified results11 Major Contributions (contd.)Top k outliers query processingFig.1. Top 3 outliers from dataset 19970115N.dat12 Major Contributions (contd.)User Interface Fig.2 User interface of the spatial outlier detection application v.s. weka13 Major Contributions (contd.)Visualization outliersFig.3 Plain text results of detected outliers14 Major Contributions (contd.)Visualization outliersFig.4 Overall traffic volume and Neighbor relationship graph between stationsDetected outliers15 Major Contributions (contd.)Visualization outliersFig.4 Overall traffic volume and Neighbor relationship graph between stations16 Major Contributions (contd.)Keeping Results-Enable to save and print user-specified resultsLet’s go to the DEMO!17 Validation MethodologyExperiments with three different data setData set Most outliers found at station19970115N.dat 2419970116N.dat 2419970125N.dat 12418 AssumptionsData format is set-The original data consists of traffic volume and occupancy. -Detection outlier is based on volume. -Data format : @relation 19970115N @station 150 @timeslot 288 1 3 4 7 45 100 …. Users are familiar with statistical concepts(e.g., confidence interval, C.I.)19 ConclusionAdding one more package in Weka to find sets of spatial outliersShowing results visuallyin the user interface similar to the user interface of Wekaby top k outliers query processing providing visualization of outliersallowing to keep user-specified results20 Future workUpgrade to allow various file format and data typeExperiments to find more efficient algorithm using different outlier detection algorithmsAdd more spatial data mining options - e.g., SAR(Spatial Auto Regression), co-location21


View Full Document

U of M CSCI 8715 - Spatial Outlier Detection and implementation in Weka

Documents in this Course
Load more
Download Spatial Outlier Detection and implementation in Weka
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Spatial Outlier Detection and implementation in Weka and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Spatial Outlier Detection and implementation in Weka 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?