DOC PREVIEW
STEVENS MA 331 - MA 331 Lecture 2 Notes

This preview shows page 1-2-3-4-5-6-42-43-44-45-46-47-85-86-87-88-89-90 out of 90 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 90 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Lecture 2Describing distributions with numbersMeanSlide Number 4Slide Number 5MedianMeasure of center: the medianSlide Number 8Slide Number 9Slide Number 10Measures of spread: QuartilesUsing R:Slide Number 13Slide Number 14Slide Number 15Slide Number 16Slide Number 17Five-Number SummarySlide Number 19R code:Slide Number 21The criterion for suspected outliersSlide Number 23Slide Number 24Calculations …Properties of the standard deviationLinear Transformations: changing units of measurementsSlide Number 28Slide Number 29Effect of a linear transformationSlide Number 31The normal distributionSlide Number 33Slide Number 34Slide Number 35Slide Number 36Slide Number 37Slide Number 38Slide Number 39Slide Number 40Formula (Redundant in this class)Finding probabilities for normal dataSlide Number 43Slide Number 44Slide Number 45Normal quantile plots R- qqnorm()Slide Number 47Newcomb’s data without outliers.Slide Number 49Slide Number 50Looking at Data-RelationshipsSlide Number 52Slide Number 53Slide Number 54 ScatterplotsExplanatory and response variablesSlide Number 58Interpreting scatterplotsForm and direction of an associationSlide Number 61Slide Number 62Strength of the associationSlide Number 64How to scale a scatterplotOutliersSlide Number 67Slide Number 68R Graphical systemSlide Number 70Example 2: Adding categorical variable/grouping (region): e is for northeastern states and m is for midwestern states (others excluded). May enhance understanding of the data.Slide Number 72Categorical variables in scatterplotsSlide Number 74Categorical explanatory variablesExample: Beetles trapped on boards of different colorsScatterplot smoothersCorrelation CoefficientThe correlation coefficient "r"Slide Number 80Slide Number 81“r” does not distinguish x & y"r" has no unit"r" ranges from -1 to +1Slide Number 85Slide Number 86Slide Number 87Slide Number 88Slide Number 89Thought quiz on correlationLecture 2Describing data with graphs and numbers. Normal Distribution. Data relationships.Describing distributions with numbers• Mean• Median• Quartiles• Five number summary. Boxplots• Standard deviationMean• The mean• The arithmetic mean of a data set (average value)• Denoted by x12...1nixx xxxnn+++==∑• Mean highway mileage for 19 2-seaters:Sum: 24+30+….+30=490Divide by n=19 Average: 25.8 miles/gallonProblem: Honda Insight 68miles/gallon!If we exclude it, mean mileage: 23.4 miles/gallon• Mean can be easily influenced by outliers. It is not a robust measure of center.Median• Median is the midpoint of a distribution.• Median is a resistant or robust measure of center.• Not sensitive to extreme observations• In a symmetric distribution mean=median• In a skewed distribution the mean is further out in the long tail than is the median.• Example: house prices: usually right skewed– The mean price of existing houses sold in 2000 in Indiana was 176,200. (Mean chases the right tail)– The median price of these houses was 139,000.Measure of center: the medianThe median is the midpoint of a distribution—the number such that half of the observations are smaller and half are larger. 1. Sort observations by size.n = number of observations______________________________110.6221.2331.6441.9551.5662.1772.3882.3992.510 10 2.811 11 2.912 3.313 3.414 1 3.615 2 3.716 3 3.817 4 3.918 5 4.119 6 4.220 7 4.521 8 4.722 9 4.923 10 5.324 11 5.6n = 24 În/2 = 12Median = (3.3+3.4) /2 = 3.352.b. If n is even, the median is the mean of the two middle observations.110.6221.2331.6441.9551.5662.1772.3882.3992.510 10 2.811 11 2.912 12 3.313 3.414 1 3.615 2 3.716 3 3.817 4 3.918 5 4.119 6 4.220 7 4.521 8 4.722 9 4.923 10 5.324 11 5.625 12 6.1Í n = 25 (n+1)/2 = 26/2 = 13 Median = 3.42.a. If n is odd, the median is observation (n+1)/2 down the listMean and median for skewed distributionsMean and median for a symmetric distributionLeft skewRight skewMeanMedianMeanMedianMeanMedian Comparing the mean and the medianThe mean and the median are the same only if the distribution is symmetrical. The median is a measure of center that is resistant to skew and outliers. The mean is not.The median, on the other hand, is only slightly pulled to the right by the outliers (from 3.4 to 3.6).The mean is pulled to the right a lot by the outliers (from 3.4 to 4.2).Percent of people dying Mean and median of a distribution with outliers4.3=xWithout the outliers2.4=xWith the outliersDisease X:Mean and median are the same. Mean and median of a symmetric4.34.3==MxMultiple myeloma:5.24.3==Mx… and a right-skewed distributionThe mean is pulled toward the skew. Impact of skewed dataMeasures of spread: Quartiles• Quartiles: Divides data into four parts• p-th percentile – p percent of the observations fall at or below it.• Median – 50-th percentile• Q1-first quartile – 25-th percentile (median of the lower half of data)• Q3-third quartile – 75-th percentile (median of the upper half of data)Using R:• First thing first: import the data. I prefer to use Excel first to save data into a .csv file (comma separated values).• Read the file TA01_008.XLS from the CD and save it as TA01_008.csv• Now R: I like to use tinn-R as the editor. Open tinn-R and save a file in the same directory that you pot the .csv file.• Now go to R/Rgui/ and click Initiate preferred. If everything is configured fine an R window should open• Now type and send line to R:• table1.08=read.csv("TA01_008.csv",header=TRUE)– This will import the data into R also telling R that the first line in the data contains the variable names.– Table1.08 has a “table” structure. To access individual components in it you have to use table1.08$nameofvariable, for example: • table1.08$CarType– Produces:• [1] Two Two Two Two Two Two Two Two Two Two Two Two Two Two Two• [16] Two Two Two Two Mini Mini Mini Mini Mini Mini Mini Mini Mini Mini Mini• Levels: Mini Two– This is a vector and notice that R knows it is a categorical variable.• mean(x) calculates the mean of variable x• median(x) will give the median• In fact you should read section 3.1 in the R textbook for all the functions you will need• summary(data.object) is another useful function. In fact:• summary(table1.08)– CarType City Highway – Mini:11 Min. : 8.00 Min. :13.00 – Two :19 1st Qu.:16.00 1st Qu.:22.25 » Median :18.00 Median :25.50 » Mean :18.90 Mean :25.80 » 3rd


View Full Document

STEVENS MA 331 - MA 331 Lecture 2 Notes

Download MA 331 Lecture 2 Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view MA 331 Lecture 2 Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view MA 331 Lecture 2 Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?