DOC PREVIEW
UNC-Chapel Hill GEOG 070 - Lecture 20 Data Quality

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Data Quality IssuesData quality issues for geographic dataData error in spatial dataAccuracy and error regarding data sets/mapsError with nominal dataThe Nominal Data CaseConfusion Matrix StatisticsWhat level of accuracy is ‘good’?Ratio data – error summariesSpatial data accuracySlide 11Data quality issues (cont’d.)Some non-error data quality issuesIn the end…Fuzzy classificationFuzzy Approaches to UncertaintyFuzzy Soils Mapping ExampleData quality issuesData Quality IssuesData Quality IssuesData quality:•proper understanding is crucial to success of any project involving geographic data•no geographic data sets can be said to be error-free•“garbage-in, garbage out”Data quality issuesData quality issues for geographic dataData quality issues for geographic dataError: Difference between the real world and the geographic data representation of it.Accuracy: (another way of describing error)Extent to which map data values match true valuesExample: Imagine a point is at 219 meters elevation above sea level, but a map represents it as 210 meters above sea level. Error: This data point is represented with 9 meters of error. Accuracy: This data point is accurate to within 9 meters.Data quality issuesData error in spatial dataData error in spatial dataLocation errors•Example: a schoolhouse is located 30 feet away from its marked location on a map•A 300 meter contour line is offset 5 meters to the northwest•A satellite image pixel is located 2.4 meters away from its actual location on the groundAttribute errors•A schoolhouse is incorrectly labeled as a church•A 300 meter contour line is actually supposed to be a 310 meter contour line•A 300 meter contour line actually represents an elevation of 302 meters•A classified satellite image pixel is labeled forest when it is actually a fieldData quality issuesAccuracy and error regarding data sets/mapsAccuracy and error regarding data sets/mapsOne data point – error/accuracy can be easily defined.Data sets/maps – error/accuracy must be summarized.How is accuracy determined and summarized?•Very accurate data must be collected (sampled) about a subset of the full dataset/map.•This accurate sample is then compared with the original data•A summary is created that compares these 2 datasets (the sample with the same measurements from the original data)Data quality issuesError with nominal dataError with nominal dataNominal data is right or wrong. Period.Examples:•Landcover type: a pixel is classified as forest or field.•A building is classified as a school or a church•A county is named Orange County or Durham CountyData quality issuesforest fields urban waterTotalforest 80 4 0 15 7 106fields 2 17 0 9 2 30urban 12 5 9 4 8 38water 7 8 0 65 0 80Wetlands3 2 1 6 38 50Total 104 36 10 99 55 304ClassificationReferenceThe Nominal Data CaseThe Nominal Data CaseAn example is when you determine the accuracy of a landcover classification.We can build something called a confusionmatrix:•This compares your classification with your ground-truth sample (the very accurate sample data, as mentioned)wetlandsData quality issuesConfusion Matrix StatisticsConfusion Matrix StatisticsSummarizing a confusion matrix:Row and column summaries are made.The most basic overall summary statistic is the percentcorrectlyclassified•This is calculated by taking the total of the diagonal entries, dividing by the grand total, and multiplying by 100 to produce a percentage•From our example: 209 / 304 * 100% = 68.8%•BUT chancealone (random assignment of classes) would give a score of better than 0A Kappaindex:•Determined through a “semi-complex” computation. •It is another measure describing overall accuracy of a classification, ranging between 0 and 100%.•A Kappa index can be used to test if a classification is statistically significantly better than a random classification.•The Kappa index for our example evaluates to 58.3%Data quality issuesWhat level of accuracy is ‘good’?What level of accuracy is ‘good’?The Overall accuracy (and row and column accuracies) are generally considered good/acceptable if they are above 85%. The USGS uses this as a guideline.The Kappa statistic describes agreement between the classified data and the reference data (it represents the increased accuracy of the performed classification over that of a random classification). A Kappa statistic of:•Above 80% is considered to have strong agreement.•Between 40% and 80% is considered to have moderate agreement.•Below 40% is considered to have poor agreement.Data quality issuesRatio data – error summariesRatio data – error summariesThe overall magnitude of errors in ratio measurements can be summarized using the rootmeansquareerror (RMSE), •Calculated by taking square root of the average squared error•This is a kind of average error•This is the primarymeasureofaccuracy used in map accuracy standards and GIS databases•e.g. we might state that the elevations in a certain digital elevation model have an RMSE of 2 meters. •2 meters is a sort of “average error” for a data point.•However, data error will range above and below this number.Question:isthisanexampleoflocationalerrororattributeerror?Data quality issuesSpatial data accuracySpatial data accuracyLocational data accuracy can also be summarized with RMSE.•A kind of average of the distance points/pixels are represented from their actual location on the ground. Locational data can also be summarized in other ways: For horizontal data, the USGS uses the US National Mapping Accuracy Standards:•90% of all measurable points are within 1/50 of an inch for maps of spatial scale less than or equal to 1:20,000, and within 1/30 of an inch for maps of spatial scale greater than 1:20,000.Data quality issuesPrecision: Level of detail at which data values are recorded.Often referred to as ‘significant digits’.Example:A cell in a raster DEM recorded as 219 meters is less precise than a cell recorded at 219.05 meters.Data Quality IssuesData Quality IssuesData quality issuesData quality issues Data quality issues (cont’d.)Error is unbiased when the error is in ‘random’ directions.•GPS data•Human error in surveying pointsError is biased when there is systematic variation in accuracy within a geographic data set•Example: GIS tech mistypes coordinate values when entering control


View Full Document

UNC-Chapel Hill GEOG 070 - Lecture 20 Data Quality

Download Lecture 20 Data Quality
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 20 Data Quality and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 20 Data Quality 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?