U of M CSCI 8715 - Spatial Outlier Detection and implementation in Weka - D2427668

Home> Schools> University of Minnesota- Twin Cities> Computer Science (CSCI) > CSCI 8715> Spatial Outlier Detection and implementation in Weka

U of M CSCI 8715 - Spatial Outlier Detection and implementation in Weka

School name University of Minnesota- Twin Cities

Course Csci 8715- Spatial Databases and Applications

Pages 21

Download Save

Unformatted text preview:

Spatial Outlier Detection and implementation in Weka Implemented by: Shan Huang Jisu Oh CSCI8715 Class Project, April 27 2004 Presented by Jisu Oh (Group 2) Slides Available at http://www.users.cs.umn.edu/~joh/csci8715/HW-list.htmTopics:MotivationSlide 4Problem StatementSlide 6Key ConceptsKey Concepts (contd.)Slide 9Major ContributionsMajor Contributions (contd.)Slide 12Slide 13Slide 14Slide 15Slide 16Validation MethodologyAssumptionsSlide 19Future workThanks!1 Spatial Outlier Detection and implementation in Weka Implemented by: Shan HuangJisu OhCSCI8715 Class Project, April 27 2004Presented by Jisu Oh (Group 2)Slides Available at http://www.users.cs.umn.edu/~joh/csci8715/HW-list.htm2 Topics:MotivationProblem StatementKey ConceptsMajor ContributionsValidation MethodologyAssumptionsConclusionsFuture work3 MotivationMachine learning /Data mining-Enables a computer program to analyze large-scale data-Decide important information which can be used to make predictions or to make decisions faster and more accurately.4 MotivationWeka-A collection of machine learning algorithms for solving real-world data mining problems-Provides data mining functions (eg, regressions, association rules, and clustering algorithm)-Limitation: operates on traditional non-spatial database5 Problem StatementInput Data setMinneapolis/St. Paul traffic data set Output : detected outliers asPlain text (timeslot, time, station, Zs(x))Overall traffic volume Neighbor relationship graph between stations6 Problem Statement(cont.)ConstraintsAlgorithm from paper “A unified approach Detecting Spatial Outliers”Dataset should be numeric ObjectiveTo find sets of spatial outliers and show the results visually7 Key ConceptsSpatial outliersDefinition – spatially referenced objects whose non-spatial attribute values are significantly different from the values of its neighborhood.Example – a new house in an old neighborhood of a growing metropolitan areaIn this project, outlier is one station which has a high volume compared to the neighboring stations at certain time slot.8 Key Concepts (contd.)AlgorithmProposed in the paper, “A Unified Approach to Detecting Spatial Outliers”, by S. Shekhar, C. T. Lu, and P. ZhangS(x) = [f(x)-Ey∈ N(x)(f(y))] : difference between f(x) - attribute value of a sensor located at x Ey - average attribute value of x’s neighborsZs(x) = |s(x) –s/σs| > θ : spatial statistic, where θ is a z-score for user specified confidence interval9 Key Concepts (contd.)Algorithm (example)1 2 3 4 520 6 7 8 92 5 10 11 127 8 100 2 13 6 7 8 9s : 0.22σs : 23.8Zs(x) = |s(x) –s|/σs = 3.98Z-score for 95% C.I. = 2 3.98 > 2Thus, 100 is an outlier Outlier is replaced by Ey.100 -> 5S(x) = f(x) –Ey = 100 – (2+8)/2 = 95 1 2 3 4 520 6 7 8 92 5 10 11 127 8 5 2 13 6 7 8 910 Major ContributionsTop k outliers query processing User interface similar to an UI of WekaProviding visualization of outliers-plain text (time slot, time, station, Zs(x))-overall traffic volume-neighbor relationship graph between stations Keeping user-specified results11 Major Contributions (contd.)Top k outliers query processingFig.1. Top 3 outliers from dataset 19970115N.dat12 Major Contributions (contd.)User Interface Fig.2 User interface of the spatial outlier detection application v.s. weka13 Major Contributions (contd.)Visualization outliersFig.3 Plain text results of detected outliers14 Major Contributions (contd.)Visualization outliersFig.4 Overall traffic volume and Neighbor relationship graph between stationsDetected outliers15 Major Contributions (contd.)Visualization outliersFig.4 Overall traffic volume and Neighbor relationship graph between stations16 Major Contributions (contd.)Keeping Results-Enable to save and print user-specified resultsLet’s go to the DEMO!17 Validation MethodologyExperiments with three different data setData set Most outliers found at station19970115N.dat 2419970116N.dat 2419970125N.dat 12418 AssumptionsData format is set-The original data consists of traffic volume and occupancy. -Detection outlier is based on volume. -Data format : @relation 19970115N @station 150 @timeslot 288 1 3 4 7 45 100 …. Users are familiar with statistical concepts(e.g., confidence interval, C.I.)19 ConclusionAdding one more package in Weka to find sets of spatial outliersShowing results visuallyin the user interface similar to the user interface of Wekaby top k outliers query processing providing visualization of outliersallowing to keep user-specified results20 Future workUpgrade to allow various file format and data typeExperiments to find more efficient algorithm using different outlier detection algorithmsAdd more spatial data mining options - e.g., SAR(Spatial Auto Regression), co-location21

View Full Document


School:
Email:
New Password:
Confirm Password:

U of M CSCI 8715 - Spatial Outlier Detection and implementation in Weka

Sign up for free to view:

Please select your school