DOC PREVIEW
UF STA 6166 - Introduction to Regression

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Topic 20 - SUMMARIZING BIVARIATE DATA – Both Variables Are QEXAMPLESBivariate Fit of Index By IntervalTopic 20- Summarizing Bivariate Quantitative Data 20-1 Topic 20 - SUMMARIZING BIVARIATE DATA – Both Variables Are Quantitative Defn: BIVARIATE DATA are a set of data in which observations of two different variables are measured on each unit. They are usually denoted as (xi, yi), i = 1,…,n. EXAMPLES 1. Peas are known to be a self-intolerant crop because repeated planting in the same field makes them susceptible to root rot diseases. In a study to examine how the crop interval (# years the field does not have a pea or other legume crop) is related to the severity of root rot disease in pea crops, soil samples from ten fields currently planted in peas were obtained. For each field, they recorded the number of years since the last pea crop (“Interval”) and an index of the level of disease (“Index”; higher value = more disease). The data are: Interval Index Field Block (ID) 0 4.5 B 4 3.7 L 6 4 M 6 3 A 6 3.1 W 8 2.8 QQ 9 1.9 R 9 3 D 9 2.3 DD 14 1.6 TTopic 20- Summarizing Bivariate Quantitative Data 20-2 2. A study of the relationship between smoking and lung cancer involved a case control study using 2000 people. 1000 people in the study had lung cancer, the other 1000 did not have lung cancer but had life styles similar to the cancer patients. For each person, the number of pack years and whether or not they had cancer was recorded. The results for the ith person were recorded as xi = category for pack years of smoking (1 = never smoked, 2 = 0.1 to 5 pack year, 3 = 5.1 to 12 pack years, 4 = 12.1 to 20 pack years, 5 = greater than 20 pack years) yi = category of lung cancer (N = no, Y = yes) Part of the data might look like: SSN X Y 123-44-5678 3 No 234-55-1234 0 No 987-23-8723 4 Yes 237-32-8736 5 Yes 843-23-8482 2 No … n=2000Topic 20- Summarizing Bivariate Quantitative Data 20-3 A) Summarizing Two Quantitative Variables In addition to looking at the usual univariate descriptions (histograms, means, stadard deviations, etc.), we can alos consider displaying and summarizing the RELATIONSHIP between two quantitative variables. One variable is sometimes thought to depend on the other variable in some way. Hence, we designate one variable to be the RESPONSE VARIABLE (Y) and the other to be the EXPLANATORY VARIABLE (X). 1) Graphical Summary Defn: a SCATTERPLOT shows the relationship between two quantitative variables measured on the same units in the population. The values of the response variable are plotted on the Y-axis and the values of the explanatory variable are plotted on the X-axis. Each pair (xi, yi) is represented by a single point on the plot.Topic 20- Summarizing Bivariate Quantitative Data 20-4 EXAMPLE Intolerant peas Note that: 1) there does appear to be a relationship: as the interval between plantings increase, disease levels in second crop go down 2) the relationship looks more or less linear (Y = a + bX) 3) the points do NOT fall exactly on a straight line (the relationship is not purely deterministic) 4) observations with the same X-value have differing Y-values (not a perfect relationship – other variables may provide additional explanation)Topic 20- Summarizing Bivariate Quantitative Data 20-5 2) Numerical Summaries of the Relationship Between X and Y a) Pearson’s Correlation Coefficient for Linear Relationships -1.5-1.0-0.50.00.51.01.52.0-2.5-1.5-0.5.51.01.52.0X -2.0-1.5-1.0-0.50.00.51.01.52.0-2.5-1.5-0.5.51.01.52.0X -2.0-1.5-1.0-0.50.00.51.01.52.0-2.5-1.5-0.5.51.01.52.0X -2.5-2.0-1.5-1.0-0.50.00.51.01.52.0-2.5 -1.5 -0.5 .5 1.0 1.5 2.0X Which shows the strongest relationship between Y and X? the weakest? Order them from weak to strong. WhichTopic 20- Summarizing Bivariate Quantitative Data 20-6 show a positive relationship? A negative relationship? Are any non-linear relationships? The Sample Pearson’s Correlation Coefficient, r, is a quantitative assessment of the strength and direction of a linear relationship between 2 variables. (i.e. we assume that, if a relationship exists, it is linear). Rules: 1) if the relationship is positive (slope>0), r > 0 if the relationship is negative (slope<0), r < 0 2) if there is no relationship (slope = 0), r = 0 3) if the relationship is perfect (every point falls exactly on a straight line), r = ± 1 depending on the sign of the slope ⇒ the stronger the relationship the closer r is to ± 1. The weaker the relationship, the closer r is to 0. slope < 0 slope = 0 slope > 0Topic 20- Summarizing Bivariate Quantitative Data 20-7 To calculate it: ∑∑==−=−−−=niyxniyxiiiizznssyyxxnr11)1(1))(()1(1 where sx is the standard deviation of the X variable data and sy is the standard deviation of the Y variable data. The Population Pearson’s Correlation Coefficient is denoted with the Greek letter rho, ρ. ρ is calculated similarly using the (x, y) values for every unit in the population. It has the same meaning and interpretations as the sample correlation coefficient. EXAMPLE Intolerant peas. From the graph the relationship looks linear and negative, so it is appropriate to use Pearson’s correlation estimate and we expect the value will be negative. Define X = time interval and Y = disease index. 9097.0,99.2,6953.3,10.7 ====yxsysxTopic 20- Summarizing Bivariate Quantitative Data 20-8 By hand: Field X Y ZXZYZXZYB 0 4.5 -1.921 1.659 -3.189 L 4 3.7 -0.838 0.780 -0.654 M 6 4 -0.297 1.110 -0.330 A 6 3 -0.297 0.010 -0.003 W 6 3.1 -0.297 0.120 -0.03 QQ 8 2.8 0.2435 -0.20 -0.05 R 9 1.9 0.5141 -1.19 -0.61 D 9 3 0.5141 0.010 0.01 DD 9 2.3 0.5141 -0.758 -0.39 T 14 1.6 1.8672 -1.527 -2.85 Sum 71 29.9 0 0 -8.117 ∑=−=−=−=niyxiizznr19019.09117.8)1(1 which implies that there is a strong relationship between years between planting peas and the index of disease and that the relationship is negative, i.e. as crop interval goes up, the disease index goes down. In JMP:Topic 20- Summarizing Bivariate Quantitative Data 20-9 Multivariate Correlations Interval (X) Index (Y)Interval (X) 1.0000 -0.9019Index (Y) -0.9019 1.0000 Scatterplot Matrix 05101522.533.544.55Interval (X)0 5 10 15Index (Y)2 2.5 3 3.5 4 4.55Topic 20- Summarizing Bivariate Quantitative


View Full Document

UF STA 6166 - Introduction to Regression

Documents in this Course
Exam 1

Exam 1

4 pages

Exam 1

Exam 1

4 pages

Exam 1

Exam 1

4 pages

VARIABLES

VARIABLES

23 pages

Exam #2

Exam #2

4 pages

Exam2

Exam2

6 pages

Sampling

Sampling

21 pages

Exam 1

Exam 1

4 pages

Exam 1

Exam 1

5 pages

Load more
Download Introduction to Regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction to Regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to Regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?