BIOM301 Chapter 3 Descriptive Analysis and Presentation of Bivariate Data Bivariate Data Occurs when 2 variables are measured on same experimental unit o Data comes in pairs 1 pair experimental unit o Assumes each pair of observations was collected independently and without bias Can be o 2 Qualitative Variables Ex Gender College Major Contingency table or side by side graphs o 1 Qualitative and 1 Quantitative Ex Gender Height Multiple Box Whisker Graphs o 2 Quantitative Variables Ex Height Shoe Size Scatter Plot is there a pattern What is the pattern How can we interpret this pattern Linear Correlation r 2 variables related to each other in a linear manner Always measures the direction and strength of the linear relationship between 2 quantitative variables o Direction o Strength r 1 Perfect Positive Correlation r 1 Perfect Negative Correlation r between 1 and 0 and 1 Intermediate Correlation r 0 No Correlation r is Positive Correlation r is Negative Correlation Correlation Coefficient r o Provides info on the direction and strength of relationship between 2 variables o Varies from 1 to 1 o r 0 can occur in 3 ways No trend in data X doesn t change with Y or Y doesn t change with X Relationship not linear BUT THERE IS A RELATIONSHIP that is missed by r o Position of variables on graph is interchangeable You get the same r value o Changing the data by a constant does not change the r value either Correlation Concerns Always accompany a correlation coefficient with a scatterplot 1 Check for Nonlinear Relationships 2 Check for Impact of Outliers 3 Correlation is not Causation 4 Third Variable Problem You need justification to remove valid data from dataset 5 Restrictive Range 6 You should never extrapolate beyond your data set ASK what is being graphed Ex Trends across countries is not trend within countries r is impacted by n Double the observations even if identical Increases the r value If data shows a pattern that is non linear you may be able to transform the data to fit a linear model 2 variables could have a strong correlation because they are strongly related to a 3rd variable lurking variable Graphing Data is CRUCIAL Many graphs can have the same r value but would be interpreted differently Using the Right Terms Correlation Coefficients describe the relationship between 2 variables Allows us to make associations BUT we are restricted to saying tends to NOT CAUSATION Layman s Terms Study LINKS cleaner air to longer life Scientists have found a CONNECTION between increasing the volume of beta carotene in the diet and fewer eye problems Paleontologists have found a RELATIONSHIP between fossil snake length and climate temperature Snakes as long as 43 feet occurred when temperatures were 10 degrees warmer Match com has found that onsite activity is TIED U S economy The site is busier when economic indicators are more negative Misery loves company Correlation Statistical Test Testing whether your r value is significantly different from 0 If r 0 no relationship between your x and y variables If r significantly different from 0 then either a statistically significant positive or negative relationship occurs Excel can do statistical test Pearson s Correlation Test FOR NOW if your p value is less than 0 5 you have a statistically significant correlation You would then need to look at r to see if it is or Goal predict a value for y the output or dependent variable given a value of x input or independent variable Linear Regression Regression Determines the Best Fit Line minimizes deviations between line and actual data points vertically Why vertically o Provides a much better fit The equation for the best fit line has 2 components o Estimate of line slope b1 Change in position of line Slope of o Estimate of the y intercept b0 Regression not restricted to simple linear relationships o If you can generate an equation to describe relationship you can use line b1 regression Usually provide several additional pieces of information with regression results o R2 value the amount of variability in the dependent variable y explained by the variability in the independent variable x R2 0 no relationship between x and y to R2 1 perfect relationship e g straight line for linear regression Can also do a Statistical Test Asking if the line is statistically significantly different from 0 o Need to interpret p value and then look at slope to see if relationship is or o The slope is NOT significantly different from zero p value 0 05 The slope is significantly different from zero p value 0 05 Components of Presenting Regression Results Title Axes labeled Line only in the range of data Equation for line R2 value Regression Concerns Similar to Correlation Concerns Outliers can have big impact Never extrapolate beyond range of data Relationship may be nonlinear be sure to graph data first Lurking variables Interpretation Is it Correlation or is it Regression Correlation just looks for trends in 2 variables Regression you are saying that the y variable is a function of the x variable But regression often used it relate 2 variables When can you do regression Regression cont Causality can only be shown with controlled experiments o Example experiment with 4 levels of fertilizer 0 1 2 and 3 mg m 6 plants assigned to each fertilizer treatment look at how tall they grow in 2 weeks o The amount of fertilizer is CONTROLLED and assumed to be ministered without error o You only look only at variability in the y variable plant height to generate best fit line Regression only minimizes the variability in the y direction when it generates the best fit regression line o Need to assume that your x variable was measured without error o But if your goal is to predict y as a function of x regression is the correct approach Correlation vs Regression Correlation o Do x and y vary together o Linear only o Not causation o r is correlation coefficient varies from 1 to 0 to 1 Regression o Generates a predictive relationship where y is a function of x o R2 tells how much of the variability in y is explained by the x variable Varies from 0 to 1 o May be causal relationship be careful of interpretation survey or experiment o Can be used for nonlinear relationships What should you do Read Chapter 3 Problems none Chapter Practice Test Parts I and II
View Full Document
Unlocking...