New version page

# UA AEM 201 - Characteristics of Data and Data Analysis

Type: Lecture Note
Pages: 3
Documents in this Course

## This preview shows page 1 out of 3 pages.

View Full Document

End of preview. Want to read all 3 pages?

View Full Document
Unformatted text preview:

Lecture 8PREVIOUS LECTUREI. Measures Of Variability Or Dispersion-Quantitative DataII. Using Measures Of Relative Location To Identify OutliersCURRENT LECTUREI. Other Characteristics of Data Distribution ShapesII. Other Tools for Exploratory Data Analysis (EDA)III. Measures of Association Between Two VariablesIV. Putting it All TogetherOTHER CHARACTERISTICS OF DATA DISTRIBUTION- Skewness: degree to which a distribution is asymmetrico Skewed left means the left tail is longer than the right tail. Skewed right means the right tail is longer than the left tal.o Income, value of business, value of homes ect. Are all right skewed- Skewness does not have an effect on the mediano Median is insensitive to extremes- Skewness does effect the meano The mean is sensitive to extremes- Note the >Md for positively skewed population and <Md for a negatively skewed population- Kurtosis: degree of relative peakedness of a data distribution. Also a measure of the heaviness of the tails of a distributiono Leptokurtic: relatively peaked with thin tailso Platykurtic: relatively flat with flat tailso Mesokurtic: relatively smooth (classified as normal)OTHER TOOLS FOR EXPLORATORY DATA ANALYSIS (EDA)- Five Number Summary: use of the minimum data value, the first quartile, median (or second quartile), third quartile, and the maximum data value to summarize a data set- Box Plot: graphical display of the results of a five number summary and outliers. One possible set of steps to construct a box plot from a data array and its five number summary.1. Create a horizontal axis of an appropriate length as the basis of the box plot2. Draw a box with vertical ends located at quartile 1 and quartile 33. Draw a vertical line through the median AEM 201 1st EditionGradeBuddy 4. Develop the inner fences : quartile 1 -1.5*IQR and quartile 3 + 1.5*IQR.Draw horizontal lines from the ends of the box to the smallest and largest values inside of the fences5. Classify all observations outside of the inner fences as outliers. If any outliers exist, indicate their locations using some symbol such as a star or an asterisk on the box plot- Box plots can illustrate skewness or peaks, location, variation, kurtosis, if there are long tails or high peaks.- Box plots are easy to make for sets of large data as long as you can get the fivenumber summary- You can make side by side box plots for easy comparisonMEASURES OF ASSOCIATION BETWEEN TWO VARIABLES- Covariance: numerical measures of linear association between quantitative variableso Calculated as:o- When will covariance be positive? Negative? Zero?o Consider a scatter plot and draw lines through the sample mean for x and the sample mean for yo X minus the sample mean * y minus the sample mean is positive in quadrants I and IIIo If most points lie in quadrants I and III, the slope will be positiveo Points spread in all quadrants suggest no relationshipo Sign of covariance suggest a negative or positive relationship- X and y go together. DO NOT PUT IN AN ARRAY. Keep pairs of x’s and y’s together- The problem with covariance is that it has units embedded in it- Correlation: standardized numerical measures of linear association between two variables. Range is generally -1 to 1.- Pearson’s Product Moment Correlation Coefficient: standardized numerical measure of linear association between two quantitative variables. Range -1 to1.o Calculated as:oGradeBuddy - When well correlation coefficient be positive? Negative? Zero?o Denominator can only be positive, but numerator can be both. Therefore, correlation coefficient can be positive, negative, or zero.o Covariance determines if the correlation is negative or positiveo If all the data lies on a perfect line it will have a correlation of 1 or -1 depending on if the slope is negative or positiveo We look at magnitude-how far the correlation coefficient is from 0. The farther from 0, the stronger the relationshipo We look at sign. Sign tells us if the relationship is negative or positiveo When you have aggregated data you will get stronger relationshipso We must know the context in order to say if the data has a strong relationship or not- BEWARE of the strong tendency to fall victim to the cum hoc ergo propter hoc(Latin for with this, therefore because of this) fallacy- Correlation is a necessary but not sufficient for causality- A strong covariance or correlation between two random variables does not imply a causal relationship exists between these two variables several alternative explanationso Reverse causality (retro causation)o Common causal (lurking variable)o Two-way (bilateral) causationo Spurious correlation (pure coincidence)- If you’re looking for a pattern in data you’ll find one. That doesn’t mean the pattern is real- Pearson’s product moment correlation coefficient is only appropriate for measuring linear associations between two quantitative variables. There are other measures of correlation more appropriate for other variables at other levels of measurement- Just know that we have only learned linearPUTTING IT ALL TOGETHER- Recall what a data dashboard is- Put whatever works best. Graphics, tables, summary data

View Full Document Unlocking...