U of M CSCI 8715 - Trends in Spatial Data Mining - D1540710

Home> Schools> University of Minnesota- Twin Cities> Computer Science (CSCI) > CSCI 8715> Trends in Spatial Data Mining

U of M CSCI 8715 - Trends in Spatial Data Mining

School name University of Minnesota- Twin Cities

Course Csci 8715- Spatial Databases and Applications

Pages 24

Download Save

Unformatted text preview:

Chapter 3Trends in Spatial DataMiningShashi Shekhar∗, Pusheng Zhang∗, Yan Huang∗, RangaRaju Vatsavai∗∗Department of Computer Science and Engineering, University of Minnesota4-192, 200 Union ST SE, Minneapolis, MN 55455Abstract:Spatial data mining is the process of discovering interesting and previously un-known, but potentially useful patterns from large spatial datasets. Extractinginteresting and useful patterns from spatial datasets is more difficult than ex-tracting the corresponding patterns from traditional numeric and categoricaldata due to the complexity of spatial data types, spatial relationships, andspatial autocorrelation. This chapter focuses on the unique features that distin-guish spatial data mining from classical data mining. Major accomplishmentsand research needs in spatial data mining research are discussed.Keywords:Spatial Data Mining, Spatial Autocorrelation, Location Prediction, Spatial Out-liers, Co-location, Spatial Statistics, Research Needs3.1 IntroductionThe explosive growth of spatial data and widespread use of spatial databasesemphasize the need for the automated discovery of spatial knowledge. Spatialdata mining [Roddick & Spiliopoulou1999, Shekhar & Chawla2003] is the pro-cess of discovering interesting and previously unknown, but potentially usefulpatterns from spatial databases. The complexity of spatial data and intrinsicspatial relationships limits the usefulness of conventional data mining techniquesfor extracting spatial patterns. Efficient tools for extracting information fromgeo-spatial data are crucial to organizations which make decisions based on largespatial datasets, including NASA, the National Imagery and Mapping Agency(NIMA), the National Cancer Institute (NCI), and the United States Depart-ment of Transportation (USDOT). These organizations are spread across manyapplication domains including ecology and environmental management, publicsafety, transportation, Earth science, epidemiology, and climatology.General purpose data mining tools, such as Clementine, See5/C5.0, and En-terprise Miner, are designed to analyze large commercial databases. Althoughthese tools were primarily designed to identify customer-buying patterns in mar-ket basket data, they have also been used in analyzing scientific and engineeringdata, astronomical data, multi-media data, genomic data, and web data. Ex-tracting interesting and useful patterns from spatial data sets is more difficultthan extracting corresponding patterns from traditional numeric and categori-cal data due to the complexity of spatial data types, spatial relationships, andspatial autocorrelation.Specific features of geographical data that preclude the use of general pur-pose data mining algorithms are: i) rich data types(e.g., extended spatial ob-jects) ii) implicit spatial relationships among the variables, iii) observations thatare not independent, and iv) spatial autocorrelation among the features. In thischapter we focus on the unique features that distinguish spatial data miningfrom classical data mining in the following four categories: data input, statisti-cal foundation, output patterns, and computational process. We present majoraccomplishments of spatial data mining research, especially regarding outputpatterns known as predictive models, spatial outliers, spatial co-location rules,and clusters. Finally, we identify areas of spatial data mining where furtherresearch is needed.3.2 Data InputThe data inputs of spatial data mining are more complex than the inputs of clas-sical data mining because they include extended objects such as points, lines,and polygons. The data inputs of spatial data mining have two distinct types ofattributes: non-spatial attribute and spatial attribute. Non-spatial attributesare used to characterize non-spatial features of objects, such as name, popula-tion, and unemployment rate for a city. They are the same as the attributesused in the data inputs of classical data mining. Spatial attributes are usedto define the spatial location and extent of spatial objects [Bolstad2002]. Thespatial attributes of a spatial object most often include information related tospatial locations, e.g., longitude, latitude and elevation, as well as shape.Relationships among non-spatial objects are explicit in data inputs, e.g.,arithmetic relation, ordering, isinstance of, subclass of, and membership of.In contrast, relationships among spatial objects are often implicit, such asoverlap, intersect, and behind. One possible way to deal with implicit spa-tial relationships is to materialize the relationships into traditional data in-put columns and then apply classical data mining techniques [Quinlan1993,Barnett & Lewis1994, Agrawal & Srikant1994, Jain & Dubes1988]. However,the materialization can result in loss of information. Another way to captureimplicit spatial relationships is to develop models or techniques to incorporatespatial information into the spatial data mining process. We discuss a few casestudies of such techniques in Section 3.4.Non-spatial Relationship Spatial Relationship(Explicit)(Often Implicit)Arithmetic Set-oriented: union, intersection, membership, · · ·OrderingTopological: meet, within, overlap, · · ·Isinstance of Directional: North, NE, left, above, behind, · · ·Subclassof Metric: e.g., distance, area, perimeter, · · ·Partof Dynamic: update, create, destroy, · · ·Membershipof Shape-based and visibilityTable 3.1: Relationships among Non-spatial Data and Spatial Data3.3 Statistical FoundationStatistical models [Cressie1993] are often used to represent observations in termsof random variables. These models can then be used for estimation, description,and prediction based on probability theory. Spatial data can be thought of asresulting from observations on the stochastic process Z(s): s ∈ D, where s is aspatial location and D is possibly a random set in a spatial framework. Here wepresent three spatial statistical problems one might encounter: point process,lattice, and geostatistics.Point process: A point process is a model for the spatial distribution of thepoints in a point pattern. Several natural processes can be modeled as spatialpoint patterns, e.g., positions of trees in a forest and locations of bird habitatsin a wetland. Spatial point patterns can be broadly grouped into random ornon-random processes. Real point patterns are often compared with a randompattern(generated by a Poisson process) using the average distance

View Full Document


School:
Email:
New Password:
Confirm Password:

U of M CSCI 8715 - Trends in Spatial Data Mining

Sign up for free to view:

Please select your school