Unformatted text preview:

Name: Jisu Oh, Shan HuangDate : March 2, 2004Course : Csci 8715Professor : Shashi ShekharProject Proposal“Spatial Outlier Detection”1. Introduction A spatial outlier is a spatially referenced object whose non-spatial attribute values aresignificantly different from the values of its neighborhood. Identification of spatial outliers can lead to the discovery of unexpected, interesting, and useful spatial patterns for further analysis. WEKA is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. Basic data mining functions as well as regression, association rules andclustering algorithms have also been implemented in WEKA, but there algorithms can only operate on traditional non-spatial database. The purpose of this project is to build a new class, which can detect spatial outlier in a spatial data set. 2. Related works Detecting spatial outliers is useful in many applications of geographic information systems, including transportation, ecology, public safety, public health, climatology, and location based services [2]. Shekhar et al. introduced a method for detecting spatial outliers in graph data set based on the distribution property of the difference between an attribute value and theaverage attribute value of its neighbors [3]. Shekhar also proposed an algorithm to find all outliers in a dataset, which replace many statistical discordance tests, regardless of any knowledge about the underlying distribution of the attributes [7]. Stephen D. Bay et al. introduced a simple nested loop algorithm to detect spatial outlier, which gives linear time performance when data is in random order and a simple pruning rule is used [4]. Existing methods for finding outliers can only deal efficiently with two dimensions/attributes of a dataset. A distance-based detection method was introduced by Sridhar Ramaswamy et al., which ranks each point on the basis of its distance to its kth nearest neighbor and declares the top n points in this ranking to outliers. A highly efficient partition-based algorithm was also introduced in this paper [6]. Edwin M. Knorr et al. proposed another distance-base outlier detection method that can be done efficiently for large datasets, and for k-dimensional datasets with large value of k [9]. Spatial outliers are most time represented as point data, but they are frequently represented in region, i.e.,a group of point. Jiang Zhao et al. proposed a wavelet analysis based approach to detect region outlier [5].Markus M. Breunig et al. showed a different approach to detecting spatial outliers; it was done by assigning to each object a degree of being an outlier, the degree, which was called the local outlier factor of an object, depends on how isolated the object is with respect to the surrounding neighborhood [10]. 3. Problem Definition1) Input : Data set includes spatial attribute with 2D grid cells, the location, and non spatial attributes.2) Output : Set of spatial outliers3) Constraints : definition of spatial outlier and used algorithms to find thema. A spatial outlier is a spatially referenced object whose non-spatial attribute values are significantly different from those of other spatially referenced objects in its spatial neighborhood.b. The algorithm will be used in this project was proposed in the paper “A Unified Approach to Detecting Spatial Outliers”.[7] The location is compared to its neighborhood using the function S(x) = [ yxf N(x)(f(y))], where  f(x) - attribute value for a location x N(x) - set of neighbors of x EyN(x)(f(y)) - average attribute value for the neighbors of x S(x) – difference of the attribute value of a sensor located at x and the average attribute value of x’s neighbors.c. Spatial statistic is used for detecting spatial outliers for normally distributed f(x).Zs(x) = ssxs )(1.s - mean value of S(x)2.s - standard deviation of S(x)3. - specified confidence level4) Objective : The objective of the project is finding outliers for a given set of thedata which has spatial attribute and non-spatial attribute.4. Methodology Constructing several experiments to test how exactly find outliers using different spatial data set and comparing efficiencies between two different algorithms1) DatasetIn this project, 16*16 Gray-Scale Image would be an input data set , which is provided in the textbook pate 192 (Shashi and Sanjay, “Spatial Databases: A Tour”, 2003). The image is a 16*16 Gray-Scale image and it would be presented by 2*2 matrix. 2) Case studyWe will find a set of outliers using different data sets then analyze how exactly they are found. 5. Contributions Major contribution of this project is development application to find spatial outlier using WEKA system. WEKA provides basic data mining functions but these are working on non-spatial database. Building a new class which can detect sets of spatial outliers using given spatial data asset and incorporating the class in existing WEKA will enable the discovery of unexpected, interesting, and useful spatial patterns for further analysis. References[1] EXPLORATORY ANALYSIS OF SPATIAL DATA[2] Algorithms for Spatial Outlier Detection, Chang-Tien Lu, Dechang Chen, Yufeng Kou[3] Detecting graph-based spatial outliers: algorithms and applications (a summary of results), Shashi Shekhar, Chang-Tien Lu, Pusheng Zhang [4] Research track: Mining distance-based outliers in near linear time with randomization and a simple pruning rule, Stephen D. Bay, Mark Schwabacher [5] Detecting region outliers in meteorological data, Jiang Zhao, Chang-Tien Lu, Yufeng Kou [6] Efficient algorithms for mining outliers from large data sets, Sridhar Ramaswamy,Rajeev Rastogi, Kyuseok Shim[7] A Unified Approach to Detecting Spatial Outliers , S. Shekhar, C. T. Lu, and P. Zhang, GeoInformatica, 2003[8] A unified approach for mining outliers, Edwin M. Knorr, Raymond T. Ng [9] Distance-based outliers: algorithms and applications, Edwin M. Knorr, Raymond T. Ng, Vladimir Tucakov [10] LOF: identifying density-based local outliers, Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jörg


View Full Document

U of M CSCI 8715 - Spatial Outlier Detection

Documents in this Course
Load more
Download Spatial Outlier Detection
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Spatial Outlier Detection and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Spatial Outlier Detection 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?