UMD ASTR 415 - Statistical Description of Data - D680758

Home> Schools> University of Maryland, College Park> Astronomy (ASTR) > ASTR 415> Statistical Description of Data

DOC PREVIEW

UMD ASTR 415 - Statistical Description of Data

School name University of Maryland, College Park

Course Astr 415- Computational Astrophysics

Pages 13

This preview shows page 1-2-3-4 out of 13 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Statistical Description of DataStatistical Description of DataCf. NRiC, Chapter 14.Statistics provides tools for understanding data.In the wrong hands these tools can be dangerous!Here's a typical data analysis cycle:1. Apply some formula to data to compute a "statistic".2. Find where value falls in a probability distribution computed on the basis of some "null hypothesis".3. If it falls in an unlikely spot (on distribution tail), conclude null hypothesis is false for your data set.StatisticsStatisticsStatistics and probability theory are closely related. Statistics can never prove things, only disprove them by ruling out hypotheses.Distinguish between model-independent statistics (this class, e.g. mean, median, mode) and model-dependent statistics (next class, e.g. least-squares fitting).Will make use of special functions (e.g. gamma function) described in NRiC, Chapter 6.Moments of a DistributionMoments of a DistributionCf. NRiC §14.1.The mean, median, and mode of distributions are called measures of central tendency.The most common description of data involves its moments, sums of integer powers of the values.The most familiar moment is the mean: =  =∑=VarianceVarianceThe width of the central value is estimated by its second moment, called the variance: or its square root, the standard deviation:Why N -1? If the mean is known a priori, i.e. if it's not measured from the data, then use N, else N -1. If this matters to you, then N is probably too small! = −∑= − =More on MomentsMore on MomentsA clever way to minimize round-off error when computing the variance is to use the corrected two-pass algorithm. First compute <x>, then do:The second sum would be zero if <x> were exact, but otherwise it does a good job of correcting RE in Var.Higher moments, like skewness (3rd moment) and kurtosis (4th moment) are also sometimes used. = −{∑= −−[∑= −]}Distribution FunctionsDistribution FunctionsA distribution function (DF) p(x) gives the probability of finding value between x & x + dx.The expected mean data value is:For a discrete DF:Similar to weighted means, e.g. center of mass.〈  〉 =∫−∞∞  ∫−∞∞ 〈  〉 =∑∑MedianMedianThe median of a DF is the value xmed for which larger & smaller values of x are equally probable:For discrete values, sort in ascending order, then:∫−∞ ==∫∞= /  / /  ModeModeThe mode of a probability DF p(x) is the value of x where the DF takes on a maximum value.Most useful when there is a single, sharp max, in which case it estimates the central value.Sometimes a distribution will be bimodal, with two relative maxima. In this case the mean and median are not very useful since they give only a "compromise" value between the two peaks.Comparing DistributionsComparing DistributionsOften want to know if two distributions have different means or variances (NRiC §14.2):1. Student's t-test for significantly different means.a) Find no. of standard errors ~ /N1/2 between two means.b) Compute statistic using nasty formula.c) Small numerical value indicates significant difference.2. F-test for significantly different variances.a) Compute F = Var1/Var2 and plug into nasty formula.b) Small value indicates significant difference.Comparing Distributions, Cont'dComparing Distributions, Cont'dGiven two sets of data, can generalize to a single question: Are the sets drawn from the same DF?Recall can only disprove, not prove.May have continuous or binned data.May want to compare one data set with known DF, or two unknown data sets with each other.Popular technique for binned data is the 2 test. For continuous data, use the KS test. NRiC §14.3.Chi-Square (Chi-Square (22) Test) TestSuppose have Ni events in ith bin but expect ni:Large value of 2 indicates unlikely match.Compute probability Q(2|) from incomplete gamma function, where  is # degrees of freedom.For two binned data sets with events Ri and Si:=∑ −=∑−Kolmogorov-Smirnov (KS) TestKolmogorov-Smirnov (KS) TestAppropriate for unbinned distributions.From sorted list of data points, construct estimate SN(x) of the cumulative DF of the probability DF from which it was drawn...SN(x) gives fraction of data points to the left of x.Constant between xi's, jumps 1/N at each xi.Note SN(xmin) = 0, SN(xmax) =1.Behavior between xmin & xmax distinguishes distributions.KS Test, Cont'dKS Test, Cont'dStatistic is maximum value of absolute difference between two cumulative DFs.To compare data set to known cumulative DF:To compare two unknown data sets:Plug D and N (or Ne = N1N2/(N1 + N2)) into nasty formula to get numerical value of significance. = ≤≤∣ −  ∣ = ≤≤∣ −

View Full Document