DOC PREVIEW
UVA STAT 2120 - MT1+Review+Notes+Highlighted

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

STAT 2120: Notes on Topic 1 Introduction to Examining Distributions: • A variable records characteristics of cases (i.e., objects of interest) in its values. • Classify a variable by its possible values: o Categorical: records group labels; numeric labels mean nothing, except possible order. o Quantitative: records meaningful numbers; may be discrete or continuous • A time series is a record of values across time. • A variable’s distribution describes the counts or relative proportions of its values. • Exploratory data analysis seeks to describe distributions and relationships in data. Displaying a distribution with graphs: • Bar graphs and pie charts describe the distribution of a categorical variable. o Bar graphs emphasize counts; pie charts, proportions. o A Pareto chart is a bar graph with categories ordered by decreasing frequency. • Histograms are essentially bar graphs of a quantitative variable. o Bar-widths are not absolute; use equal bar-widths and “eyeball” for best picture. o Look for overall pattern, shape center, spread, deviations in shape, and “outlier” deviations. o A symmetric distribution is such that its histogram mirrors itself about its center. o A right- or left-skewed distribution shows a long tail to the right or left in its histogram. • Stemplots are back-of-the-envelope histograms drawn with the digits of quantitative values o “Stem” digits define bars; “leaf” digits display counts and sub-counts. o Customize by rounding digits and splitting stems. • Time plots graph time series values by time. o Emphasize patterns of change over time, such as trends and seasonal variations. Describing distributions with numbers: • Denote by ,…, the values of  observations. • th percentile is a number such that  percent of values fall on or below. • Describe a distribution with numerical summaries of shape, center, and spread. • A summary is resistant if it is insensitive to changes in skewness or extreme values. • Measure of center: mean,  o ∑, the arithmetic average. o  is not resistant. • Measure of center: median,  o  is the 50th percentile. o Calculate as the middle value or average of two middle values. o  is resistant. • Measure of spread: extreme values o Smallest and largest values o Extreme values are not resistant. • Measure of spread: quartiles,  and  o  is the 25th percentile;  is the 75th percentile o Calculate  and  as medians of values falling to the left or right of (but not on) . o  and  are resistant. • Measure of spread: standard deviation,  o √, where ∑󰇛󰇜, a rescaled average of squared-deviations from . o  is the variance; 1 is “degrees of freedom;” square-root to match units with . o Calculate by computer. o  is not resistant. • Measure of shape: mean-median comparisons: o   if the distribution is symmetric;  if right-skewed;   if left-skewed • Useful descriptions of a distribution: o Summarize center and spread, e.g., as  and  (for symmetric, outlier-free distributions). o Display  and  graphically as “error bars.” o Five-number summary: smallest extreme, , , , largest extreme. o Display the five-number summary graphically as a box plot. Normal distributions: • A density curve is an idealization for describing patterns seen in histograms. o Denote by  a variable representing an idealized observation. o “Area under the curve” in a range represents the proportion of observations in that range. o Total “area under the curve” is one. o The median is the point that divides “area under the curve” equally to the left and right. o Denote by  and  idealizations of  and  formulated on a density curve. o  is the balance point. • Normal distributions are described by the class of density curves called “Normal curves.” o Symmetric, single-peaked, and bell-shaped. o Indexed by  and , denoted 󰇛,󰇜. o  mark a Normal curve’s inflection points.o 68-95-99.7 rule: For observations having a Normal distribution: 68% fall within ; 95% fall within 2; and 99.7% fall within 3. o The standard normal distribution is 󰇛0,1󰇜 • Suppose  has a distribution with mean  and standard deviation . The z-score, or standardized value, of  is 󰇛󰇜⁄. o Measures “location from  in units of .” o If  is 󰇛,󰇜 then  is 󰇛0,1󰇜. o To calculate an “area under the curve” for 󰇛,󰇜 translate to a z-score and use 󰇛0,1󰇜. • Calculations involving 󰇛,󰇜 might be forward (What proportion  has ?) or backward (For what  is the proportion of  equal to ?) • A Normal quantile plot is a graph of percentiles of ,…, plotted against those of 󰇛0,1󰇜. o Plots on a straight line indicate a Normal distribution. o Calculate by computer. Introduction to Examining Relationships : • Approach: plot data, calculate summaries; look for patterns and deviations; consider idealizations • An explanatory (or independent) variable explains variability in the response (or dependent) variable. • Scatterplots: graph two quantitative variables measured on the same set of individuals. o Look for overall pattern; general deviations, “outlier” deviations. o Scatterplots are sometimes “smoothed” using algorithms that fit curves to the data. o A transformation (e.g., the log transformation) is sometimes applied to skewed data. o A scatterplot be extended by adding categorical variables, color- or symbol-coded. • The overall pattern of a relationship: o The form of a relationship may involve linear patterns, clusters, or lack of any pattern. o The direction may be positive or negative. o A stronger relationship is observed as points falling more closely to a clear from. Correlation: • Measure of direction and strength: correlation,  o ∑󰇡󰇢, the rescaled average of the product of standardized deviations from  and . o Calculate by


View Full Document

UVA STAT 2120 - MT1+Review+Notes+Highlighted

Download MT1+Review+Notes+Highlighted
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view MT1+Review+Notes+Highlighted and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view MT1+Review+Notes+Highlighted 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?