Unformatted text preview:

BIOM301 Chapter 2 Descriptive Analysis Single Variable Data 9 6 2012 9 11 2012 Variables can be classified into several categories How you summarize data depends on variable type o Variable Summary of Graphs Overview Measured of Central Tendency Measures of Dispersion Measures of Position The Art of Statistical Deception A Summary of Graphs Quantitative Numerical Value Stem Leaf Diagrams Frequency Histograms Qualitative Attribute Circle Graphs Bar Graphs Qualitative Data o Often use Circle and Bar Graphs o Shows relative proportion in various categories o Graph can show Frequency or Relative Frequency Both provide similar information Frequency the of observations in each category Relative Frequency the of observations in that category o Circle Graph Components Size of sections relative proportion in that category Informative Title Legend o Bar Graphs Components Same as circle graphs but with bars to show relative proportions Represent frequency or relative frequency Have an Informative Graph Title Show each axis with a legend Have a SPACE between each bar indicates qualitative data o Circle Graph vs Bar Graph Quantitative Data Both used to display sample results Pick the one that shows information most clearly o Simplest Stem Leaf Diagram 1 Determine what will be the stem and what will be the leaf 2 Go through the data and place leaf values adjacent to their stem values Steps Components Diagram Title identifying the variable Key explaining stem and leaf components o Frequency Distributions Frequency Histograms Looks at how frequently f values x for a Quantitative variable occur o Ungrouped vs Grouped Just counting up the number of times a value occurs works fine for some data sets ungrouped data For larger data sets we need to summarize the data One way Create groups of value that are similar Groups are called Classes Grouping data divide data into equally spaced groups and classes cannot overlap Good approach is of classes square root n o Frequency Relative Frequency Histogram Title identifying graph Vertical axis label Horizontal axis label w class boundaries Bars without gaps in between quantitative data o So the shape of your Histogram provides information Need to be able to describe shapes Symmetric one side of graph mirror of other side Uniform every value occurs with the same frequency Normal symmetric and mounded up around the mean and sparse at extremes o All normal curves are symmetric but all symmetric curves are NOT normal Skewed 1 tail is stretched out longer than the other tail J shaped no tail on side with highest frequency Bimodal have more than 1 peak o Outliers Not always present Value s the fall s a distance away from the rest of data B Measures of Central Tendency Gives you information about where the middle of your sample data occurs Mean Arithmetic mean Median the middle value Mode most frequent observation Midrange the number exactly midway between the lowest valued data and highest valued data C Measure of Center Effects If the data are symmetrically and unimodal distributed then the mean median mode midrange If the data are NOT symmetric the mean is impacted the most Mean and Midrange are impacted the most by outliers o Median should be reported if outliers are suspected in the data D So why do you usually see mean not medians reported The mean is the only measure of center that uses all the values of the observations in the calculation Mode and Median only use the middle value s and Midrange uses the largest and smallest values E Measures of Dispersion Range highest sample value lowest sample value Variance s2 It s the average squared deviation of the data Standard Deviation s Measures average variance in a data set o Is always positive except if all values are exactly the same then s 0 o Increases as variability in the data set increases F A distribution of sample values can be explained by the mean and the standard deviation The mean gives an estimate of the center of the data The standard deviation gives an estimate of the spread of the data G Measures of Position Summarizing Sample Data measures of center AND spread Box and Whisker Plots are useful ways to summarize data o Plot provides 5 pieces of information o You need to be able to interpret these plots but not generate them Quartiles or Percentages o Data order from smallest to largest value o Percentiles or quartiles data values dividing observations into 25 intervals taking the median of the two halves Box and Whisker Plot Shifting from Frequency Histograms to Density Curves o If enough data pattern can be displayed as a smooth curve Density Curve Theoretical description data distribution o Density Curve Line always on or above the horizontal axis vertical axis not labeled The area under the curve 1 0 Can take on any shape Important Density Curve Normal Curve o Bell shaped Symmetric Unimodal o Shape described by the mean and the standard deviation o There is a whole family of normal curves o Normal Probability Distribution Function YOU DO NOT NEED TO KNOW THIS Important to Note that Normal Probability Distribution Function defined by 2 variables 1 mu the population mean and 2 sigma the population standard deviation o Compare Statistics vs Parameters x sample mean Population mean s sample standard deviation Population standard deviation s2 sample variance 2 Population variance Why are we using Population terms for Normal Curve Equations o The inflection point of a normal curve line is located 1 standard deviation from the mean 68 of 68 95 99 7 RULE observations lie between or 1 regardless of the shape of that normal curve o Z scores Easier way to do the same thing above Establishes the position of a value x measured in the number of standard deviations from the mean z Your Value Population Mean x Population standard deviation 1 1 standard deviation below the mean 1 1 standard deviation above the mean Rounding Rule o Carry one more decimal place when you crunch the number than is present in the original set of data o Ex The mean of 4 6 9 20 9 8 H The Art of Statistical Deception Means affected by Outliers o A good way to hide the effects of a few outliers o Median vs Mean Confusing Graphs o Not showing full scale o Using pictures or figures instead of bars increasing figure size distorts proportions o Using 3 D bar graphs o Misrepresentation Correlation is not causation o Correlation occurs when 2 variables seem to change together However if not tested experimentally you can not imply that X


View Full Document

UMD BIOM 301 - Chapter 2 – Descriptive Analysis & Single Variable Data

Loading Unlocking...
Login

Join to view Chapter 2 – Descriptive Analysis & Single Variable Data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter 2 – Descriptive Analysis & Single Variable Data and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?