DOC PREVIEW
WUSTL CSE 567M - Summarizing Measured Data

This preview shows page 1-2-3-4-26-27-28-53-54-55-56 out of 56 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 56 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

12-1©2006 Raj JainCSE567MWashington University in St. LouisSummarizing Summarizing Measured DataMeasured DataRaj Jain Washington University in Saint LouisSaint Louis, MO [email protected] slides are available on-line at:http://www.cse.wustl.edu/~jain/cse567-06/12-2©2006 Raj JainCSE567MWashington University in St. LouisOverviewOverview! Basic Probability and Statistics Concepts: CDF, PDF, PMF, Mean, Variance, CoV, Normal Distribution! Summarizing Data by a Single Number: Mean, Median, and Mode, Arithmetic, Geometric, Harmonic Means! Mean of A Ratio! Summarizing Variability: Range, Variance, percentiles, Quartiles! Determining Distribution of Data: Quantile-Quantile plots12-3©2006 Raj JainCSE567MWashington University in St. LouisPart III: Probability Theory and StatisticsPart III: Probability Theory and Statistics1. How to report the performance as a single number? Is specifying the mean the correct way?2. How to report the variability of measured quantities? What are the alternatives to variance and when are they appropriate?3. How to interpret the variability? How much confidence can you put on data with a large variability?4. How many measurements are required to get a desired level of statistical confidence?5. How to summarize the results of several different workloads on a single computer system?6. How to compare two or more computer systems using several different workloads? Is comparing the mean sufficient?7. What model best describes the relationship between two variables? Also, how good is the model?12-4©2006 Raj JainCSE567MWashington University in St. LouisBasic Probability and Statistics ConceptsBasic Probability and Statistics Concepts! Independent Events: Two events are called independent if the occurrence of one event does not in any way affect the probability of the other event. ! Random Variable: A variable is called a random variable if it takes one of a specified set of values with a specified probability.12-5©2006 Raj JainCSE567MWashington University in St. LouisCDF, PDF, and PMFCDF, PDF, and PMF! Cumulative Distribution Function: ! Probability Density Function:10xF(x)f(x)x12-6©2006 Raj JainCSE567MWashington University in St. LouisCDF, PDF, and PMF (Cont)CDF, PDF, and PMF (Cont)! Given a pdf f(x):! Probability Mass Function: For discrete random variables:xif(xi)12-7©2006 Raj JainCSE567MWashington University in St. LouisMean, Variance, Mean, Variance, CoVCoV! Mean or Expected Value:! Variance: The expected value of the square of distance between x and its mean! Coefficient of Variation:12-8©2006 Raj JainCSE567MWashington University in St. LouisCovariance and CorrelationCovariance and Correlation! Covariance: ! For independent variables, the covariance is zero:! Although independence always implies zero covariance, the reverse is not true.! Correlation Coefficient: normalized value of covarianceThe correlation always lies between -1 and +1.12-9©2006 Raj JainCSE567MWashington University in St. LouisMean and Variance of SumsMean and Variance of Sums! If are k random variables and if are k arbitrary constants (called weights), then:! For independent variables:12-10©2006 Raj JainCSE567MWashington University in St. LouisQuantiles, Median, and ModeQuantiles, Median, and Mode! Quantile: The x value at which the CDF takes a value α is called the α-quantile or 100α-percentile. It is denoted by xα:! Median: The 50-percentile or (0.5-quantile) of a random variable is called its median.! Mode: The most likely value, that is, xithat has the highest probability pi, or the x at which pdf is maximum, is called mode of x.1.000.00xF(x)0.250.500.75f(x)x12-11©2006 Raj JainCSE567MWashington University in St. LouisNormal DistributionNormal Distribution! Normal Distribution: The sum of a large number of independent observations from any distribution has a normal distribution.! A normal variate is denoted at N(μ,σ). ! Unit Normal: A normal distribution with zero mean and unit variance. Also called standard normal distribution and is denoted as N(0,1).12-12©2006 Raj JainCSE567MWashington University in St. LouisNormal QuantilesNormal Quantiles! An α-quantile of a unit normal variate z∼ N(0,1) is denoted by zα. If a random variable x has a N(μ, σ) distribution, then (x-μ)/σ has a N(0,1) distribution.or12-13©2006 Raj JainCSE567MWashington University in St. LouisWhy Normal?Why Normal?! There are two main reasons for the popularity of the normal distribution:1. The sum of n independent normal variates is a normal variate. If, then x=∑i=1naixihas a normal distribution with mean μ=∑i=1naiμiand variance σ2=∑i=1nai2σi2. 2. The sum of a large number of independent observations from any distribution tends to have a normal distribution.This result, which is called central limit theorem, is true for observations from all distributions=> Experimental errors caused by many factors are normal.12-14©2006 Raj JainCSE567MWashington University in St. LouisSummarizing Data by a Single NumberSummarizing Data by a Single Number! Indices of central tendencies: Mean, Median, Mode! Sample Mean is obtained by taking the sum of all observations and dividing this sum by the number of observations in the sample.! Sample Median is obtained by sorting the observations in an increasing order and taking the observation that is in the middle of the series. If the number of observations is even, the mean of the middle two values is used as a median.! Sample Mode is obtained by plotting a histogram and specifying the midpoint of the bucket where the histogram peaks. For categorical variables, mode is given by the category that occurs most frequently. ! Mean and median always exist and are unique. Mode, on the other hand, may not exist.12-15©2006 Raj JainCSE567MWashington University in St. LouisMean, Median, and Mode: Relationships Mean, Median, and Mode: Relationships12-16©2006 Raj JainCSE567MWashington University in St. LouisSelecting Mean, Median, and ModeSelecting Mean, Median, and Mode12-17©2006 Raj JainCSE567MWashington University in St. LouisIndices of Central Tendencies: ExamplesIndices of Central Tendencies: Examples! Most used resource in a system: Resources are categorical and hence mode must be used.! Interarrival time: Total time is of interest and so mean is the proper choice.! Load on a Computer: Median is preferable due to a highly skewed distribution.! Average Configuration: Medians of number devices, memory sizes, number of


View Full Document

WUSTL CSE 567M - Summarizing Measured Data

Documents in this Course
Load more
Download Summarizing Measured Data
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Summarizing Measured Data and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Summarizing Measured Data 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?