UNC-Chapel Hill GEOG 070 - Simple Linear Regression - D965915

Home> Schools> University of North Carolina at Chapel Hill> Contemporary European Studies (GEOG) > GEOG 070> Simple Linear Regression

DOC PREVIEW

UNC-Chapel Hill GEOG 070 - Simple Linear Regression

School name University of North Carolina at Chapel Hill

Course Geog 070-

Pages 21

This preview shows page 1-2-20-21 out of 21 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Simple Linear RegressionTwo Sorts of Bivariate RelationshipsA Deterministic RelationshipA Probabilistic RelationshipSimple Linear RegressionFitting a Line to a Set of PointsLeast Squares MethodMinimizing the Error Term eError Sum of SquaresFinding Regression CoefficientsInterpreting Slope (b)The Strength of RelationshipsThe Strength of RelationshipsCoefficient of Determination (R2)Coefficient of Determination (R2)Coefficient of Determination (R2)Pond Branch Catchment – Control Color Infrared Digital OrthophotographySoil Moisture Sampling MethodPond Branch Catchment – Control Topographic Index ExampleComparing Soil Moisture and TMIDavid Tenenbaum – GEOG 070 – UNC-CH Spring 2005• Up until this point, the quantitative methods we have been studying have been designed to help us understand a single random variable• In many cases, we are interested in examining the relationships between multiple variables• In geography, this is usually in an effort to explain the spatial pattern of one set of values in terms of other co-located spatial patterns• The simplest instances of these approaches are bivariate analyses, where we study the relationship between two variablesSimple Linear RegressionDavid Tenenbaum – GEOG 070 – UNC-CH Spring 2005• Generally, we can classify the nature of the relationship between a pair of variables into two types:• A bivariate relationship can be deterministic, where knowledge of one of the variables entails a perfect knowledge of the other OR• A bivariate relationship can be probabilistic, where knowledge of one of the variables can allow you to estimate the value of the other variable, but not with absolute accuracyand/or certaintyTwo Sorts of Bivariate RelationshipsDavid Tenenbaum – GEOG 070 – UNC-CH Spring 2005• Suppose we are traveling from one place to another on the Interstate, and we travel at a constant speed• There is a deterministic relationship between the time spent driving and the distance traveled that we can express graphically, or using an equation:A Deterministic Relationshiptime (t)distance (s)s = s0+ vts: distance traveleds0: initial distancev: speedt: time traveledslope (v)intercept (s0)• Unfortunately, few relationships are truly deterministicDavid Tenenbaum – GEOG 070 – UNC-CH Spring 2005• More often, we find relationships between two variables that have a probabilistic nature• For example, suppose we compare the ages and heightsof a sample of young people between 2 and 20 years old:A Probabilistic Relationshipage (years)height (meters)• Here, we cannot predict height from age as we could distance from time in the previous example• There is a relationship here, but there is an element of unpredictability or errorcontained in this modelDavid Tenenbaum – GEOG 070 – UNC-CH Spring 2005• A means that we can use to characterize a probabilistic relationship like the one we saw in the previous slide is using simple linear regression, a linear model with the following characteristics:y = a + bx + εSimple Linear Regressionx (independent)x is the independent variabley (dependent)y is the dependent variablebb is the slope of the fitted lineaa is the intercept of the fitted lineerror: εε is the error termDavid Tenenbaum – GEOG 070 – UNC-CH Spring 2005• When we have a data set consisting of an independent and a dependent variable, and we plot these using a scatterplot, to construct our model between the relationship between the variables, we need to select a line that represents the relationship:Fitting a Line to a Set of Pointsx (independent)y (dependent)• We can choose a line that fits best using a least squares method• The least squares line is the line that minimizes the vertical distances between the points and the line, i.e. it minimizes the error term ε when it is considered for all points in the data setDavid Tenenbaum – GEOG 070 – UNC-CH Spring 2005• The least squares method operates mathematically, minimizing the error term ε for all points• We can describe the line of best fit we will find using the equation ŷ = a + bx, and you’ll recall that from a previous slide that the formula for our linear model was expressed using y = a + bx + εLeast Squares Methodyŷ = a + bxŷ• We use the value ŷ on the line to estimate the true value, y(y - ŷ)• The difference between the two is (y - ŷ)• This difference is positive for points above the line, and negative for points below itDavid Tenenbaum – GEOG 070 – UNC-CH Spring 2005• In a linear model, the error in estimating the true value of the dependent variable y is expressed by the difference between the true value and the estimated value ŷ, ε = (y - ŷ)• Sometimes this difference will be positive (when the line underestimates the value of y) and sometimes it will be negative (when the line overestimates the value of y), because there will be points above and below the line• If we were to simply sum these error terms, the positive and negative values would cancel out• Instead, we can square the differences and then sum them up to create a useful estimate of the overall errorMinimizing the Error Term εDavid Tenenbaum – GEOG 070 – UNC-CH Spring 2005• By squaring the differences between y and ŷ, and summing these values for all points in the data set, we calculate the error sum of squares (usually denoted by SSE):Error Sum of SquaresSSE = Σ (y - ŷ)2i = 1n•The least squares method of selecting a line of best fit functions by finding the parameters of a line (intercept a and slope b) that minimizes the error sum of squares, i.e. it is known as the least squares method because it finds the line that makes the SSE as small as it can possibly be, minimizing the vertical distances between the line and the pointsDavid Tenenbaum – GEOG 070 – UNC-CH Spring 2005•The equations used to find the values for the slope (b) and intercept (a) of the line of best fit using the least squares method are:Finding Regression CoefficientsΣ (xi- x) (yi-y)i = 1nb =Σ (xi-x)2i = 1na = y - bxWhere:xiis the ithindependentvariable valueyiis the ithdependentvariable valuex is the mean value of all the xivaluesy is the mean value of all the yivaluesDavid Tenenbaum – GEOG 070 – UNC-CH Spring 2005Interpreting Slope (b)Positive relationship –As the values of x increase, the values of y increase tooNegative (a.k.a. inverse) relationship – As values of x increase, the values of y decrease•The slope of the line (b),

View Full Document