Unformatted text preview:

University of California Los Angeles Department of Statistics Statistics C173 C273 Instructor Nicolas Christou Introduction What is geostatistics Geostatistics is concerned with estimation and prediction for spatially continuous phenomena using data obtained at a limited number of spatial locations Here with phenomena we mean the distribution in a two or three dimensional space of one or more random variables called regionalized variables The phenomenon for which the regionalized variables are referred to it is called regionalization For example the distribution of mineral ore grades in the three dimensional space Or the distribution of ozone etc History The term geostatistics was coined by Georges Matheron 1962 Matheron and his colleagues at Fontainebleau France used this term in prediction for problems in the mining industry The prefix geo concerns data related to earth Today geostatistical methods are applied in many areas beyond mining such as soil science epidemiology ecology forestry meteorology astronomy corps science environmental sciences and in general where data are collected at geographical locations spatial locations The spatial locations throughout the course will be denoted with s1 s2 sn and the spatial data collected at these locations will be denoted with z s1 z s2 z sn Spatial locations are determined by their coordinates x y We will mainly focus in two dimensional space data Very important in the analysis of spatial data is the distance between the data points We will use mostly Euclidean distances Suppose data point si has coordinates xi yi and data point sj has coordinates xj yj The Euclidean distance between points si and sj is given by dij q xi xj 2 yi yj 2 Other forms of distances can be used great circle distance azimuth distance travel distance from point to point time needed to get from point to point etc 1 The problem Present and explain the distribution of the random function Z s s D Predict the value of the function Z s at spatial location s0 in other words the value z s0 using the observed data vector z s1 z s2 z sn see figure below 128 130 132 134 136 138 140 y coordinate s2 s6 s5 s1 s0 s3 62 64 66 68 s4 s7 70 72 74 x coordinate Environmental protection agencies set maximum thresholds for harmful substances in the soil atmosphere and water Therefore given the data we should also like to know the probabilities that the true values exceed these thresholds at unsampled locations A random function Z s can be seen as a set of random variables Z si defined at each point si of the random field D Z s Z si si D These random variables are correlated and this correlation depends on the vector h that separates two points s and s h the direction south north east west etc but also on the nature of the variables considered The data can be thought as a realization of the function Z s with s varying continuously throughout the region D Geostatistical theory is based on the assumption that the variability of regionalized variables follows a specific pattern For example the ozone level z s at location s is auto correlated with the ozone level z s h at location s h Intuitively locations close to one another tend to have similar values while locations farther apart differ more on average Geostatistics quantifies this intuitive fact and uses it to make predictions 2 Motivating examples Example 1 Surface elevations For these data the coordinates x y and elevation was recorded at 52 locations as shown below Circle plot of the surface elevation data 6 5 4 3 Y Coord 2 1 0 1 2 0 3 4 5 6 X Coord The circles have centers at the sampling locations given by the coordinates and the radius of each circle is determined by a linear transformation of the elevations Also observed that the circles are filled with grey shades The objective in analyzing these data is to construct a continuous elevation map resulting in a raster map The raster map below shows the elevation of an area in south west Wake county in North Carolina USA 3 Example 2 The data below were collected from the flooded banks of the Meuse river in Dutch Maas river The data points 333000 332000 331000 178500 179000 330000 Latitude 179500 180000 180500 181000 181500 Longitude 4 Concentration of lead and zinc Lead concentration ppm Zinc concentration ppm 113 198 326 674 5 1839 37 72 5 123 207 654 According to the Unites States Environmental Protection Agency US EPA the level of risk for surface soil based on lead concentration in ppm is given on the table below Mean concentration ppm Level of risk Below 150 Lead free Between 150 400 Lead safe Above 400 Significant environmental lead hazard Construction of a grid 5 Construction of a raster map Maas river log lead predictions 333000 332000 6 Y 331000 5 5 179000 4 4 5 4 5 5 5 5 5 4 5 5 4 5 5 330000 5 5 179500 180000 180500 181000 X Few R commands Read the Maas data a read table http www stat ucla edu nchristo statistics c173 c273 soil txt header TRUE class a library geoR b as geodata a class b points b plot b library gstat coordinates a x y class a bubble a lead main Lead concentration ppm bubble a zinc main Zinc concentration ppm 6 Another example The map below shows 175 ozone stations 08 August 2005 data 42 Ozone locations in California 40 36 Latitude 38 34 32 126 124 122 120 118 116 Longitude Try the following commands a read table http www stat ucla edu nchristo statistics c173 c273 o3 txt header TRUE library geoR library gstat library maps plot a lon a lat xlim c 126 112 ylim c 32 42 2 xlab Longitude ylab Latitude main Ozone locations in California map county ca add TRUE What do the following commands do aa as data frame cbind a lon a lat a o3 bb as geodata aa class bb points bb How about these coordinates a lon lat class a bubble a o3 xlab Longitude ylab Latitude maxsize 1 3 key entries 0 02 1 6 7 114 112 An example using the maps package Data on ozone and other pollutants are collected on a regular basis The data set for this example concerns 175 locations for ozone ppm in California on 08 August 2005 You can read more about smog causing pollutants at http www nytimes com 2010 01 08 science earth 08smog html th emc th The data can be accessed here a read table http www stat ucla edu nchristo statistics c173 c273 o3 txt header TRUE The package maps in R can be loaded as follows library maps We can display the data points and the map using the following commands plot a lon a lat xlim c 125 114 ylim c 32 43 xlab Longitude ylab Latitude main Ozone locations in California


View Full Document

UCLA STATS C173 - c173c273_lec1_w11

Download c173c273_lec1_w11
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view c173c273_lec1_w11 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view c173c273_lec1_w11 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?