NCSU ST 522 - Principles of Data Reduction

Unformatted text preview:

Chapter 6: Principles of Data Reduction1 Statistical InferenceData X = (X1, . . . , Xn): from a probability distribution f(x|θ), with θunknown.• Our task is to estimate θ based on data.Examples:– to estimate the success probability p in a Bernoulli trial– to estimate the supporting rate p of a president candidate– to estimate the average SAT score of the freshmen at a national level• Three types of methods to estimate θ– point estimation (Chapter 7)– hypothesis testing (Chapter 8)– interval estimation (Chapter 9)Two Steps for Statistical InferenceStep 1 Data reductionsummarizing information abou t θ in data with one or a few statisticsT = T (X)• Data X1, ..., Xncontains much information, some are relevant for θand some are not.• Dropping ir relevant information is desirable, but dr opping relevantinformation is undesirable.• the dimen s ion of T is generally smaller than the sample size n.Step 2 Estimator constructionusing T to construct point estimators, test statistics, upper/lower con-fidence limit.82 Statistics and PartitionDef.: A statistic T (X) is a function of the sample X1, · · · , Xn. Examples:• sample mean¯X, sample variance S2• the largest order statistic X(n), the sm allest order statistic X(1)Partition of Sample Space by T (X)Consider the discrete case. For any possible value t of T, there is acorresponding setAt= {x : T (x) = t}.The set collection {At, all t} makes a partition on the sample s pace of X.NoteP (T (X) = t) =Xx∈AtP (X = x).The event {X = x} is the s ubset of {T (X) = T(x)}, i.e.,{ω|X(ω) = x} ⊂ {ω|T (X(ω)) = T (x)}.Example. Toss a coin n = 3 times, and let X1, · · · , X3be respectivelythe outcome of each toss. Let T = the total number of heads obtained, i.e.,T =P3i=1Xi. Write down the partition of the sample space given by T .Remark: Often T has a simpler data structure and d istrib ution than theoriginal sample X = (X1, · · · , Xn), so it would be nice if we can use T (X)to s ummarize and then rep lace the entire data.9Important Issues in Data Reduction:We should think about the following questions carefully before the “sim-plification” process:• Is there any loss of information due to summarization?• How to compare the amount of information about θ in the originaldata X and in T (X)?• Is it sufficient to consider only the “reduced data” T ?3 Sufficient S tatisticsA statistic T is called sufficient if the conditional distribution of X given Tis free of θ (that is, the conditional is a completely known distribution).Remark 1:• A distrib ution free of θ means that the distribution is completelyknown, hence the correspon ding ran dom quantity can be generated witha random number generator.Example. Toss a coin n times, and the probability of head is an u nknownparameter p. Let T = the total number of heads. Is T sufficient for p?10Remark 2:• Consid er the discrete random variable case. Assume T is sufficient forθ. Given any value of T , we can define a conditional distribution of X givenon T (X) = t (with the restricted sample space At), and generate a pseudodata X′by the random number generatorP (X′= x|T (X) = T (x)) = P (X = x|T (X) = T (x)). (∗)Note (*) is actually a probability distribution defined on the set AT (x). Both{X = x} and {X′= x} are subsets of {T (X) = T (x)}. We can show thatX and X′have the identical distribution, i.e.Pθ(X = x) = Pθ(X′= x), ∀x, θ.• Since X′1, ..., X′ncan be regarded another random sample from the samepopulation as the original data X1, · · · , Xn, they contain an equal amountof probabilistic information about θ. T herefore, we can “recover the data”if we retain T and discard X1, · · · , Xn. That’s why T is “sufficient”.• For the continuous case, we have the same conclusion (proof omittedfor technical reasons).Sufficiency PrincipleIf T is sufficient, the “extra information” carried by X is worthless aslong as θ is concerned. It is then only natural to consider inference proce-dures which do not use this extra irr elevant information. This leads to theSufficiency Principle :Any inference procedure should depend on the data only through sufficientstatistics.11Conditional Probability (discrete case):For any x and t, we havePθ(X = x|T (X) = t) =(Pθ(X=x,T (X)=t)Pθ(T (X)=t)=Pθ(X=x)Pθ(T (X)=t)if T (x) = t,0 if T (x) 6= t,so we only need to compute Pθ(X = x|T (X) = T (x)).How to check sufficiency?Let the distribution of data X is p(x; θ) and the distribu tion of T beq(t; θ). The t-th conditional is p(x; θ)/q(t; θ) on At. This should be free ofθ (but may depend on x) for all t, if T is sufficient for θ. In other words, ifp(x; θ)/q(T (x); θ) is free of θ, (may depend on x) (∗)for all x and θ, then T is a sufficient statistic for θ.Example. X1, . . . , Xniid N (θ, 1). T =¯X.Remarks: (*) is still not very convenient to apply.• Need to guess the form of a sufficient statistic.• Need to figure out the distribution of T .12How to find a sufficient statistic?(Neyman-Fisher) Factorization theorem.T is sufficient if and only if p(x; θ) can be written as the productg(T (x); θ)h(x), w here the first factor depends on x only though T (x) andthe second factor is free of θ.Proof. (for discrete case only)13Example. Binomial. iid bin(1, θ)Example. Poisson. iid Poi(θ).Example. Exp(β).14Example. Normal. iid N (θ, 1).When the range of X depends on θ, should be more careful aboutfactorization. Must use indicator functions explicitly.Example. Uniform. iid U (0, θ).15Two-dimensional Examples.Normal. iid N(µ, σ2). θ = (µ, σ2) (both unknown).Gamma. iid Ga(α, β). θ = (α, β).16All the examples above but the last one are special cases of a general resultfor the exponential family.Exponential Family: Recall the density function of an exponential familyf(x; θ) = c(θ)h(x) exp[kXj=1wj(θ)tj(x)], θ = (θ1, . . . , θd)(full exponential family if d ≤ k; curved family if d < k).Theorem. Let X1, . . . , Xnbe a random sample from the exponential family.ThenT (X) = (nXi=1t1(Xi), . . . ,nXi=1tk(Xi))is sufficient for θ = (θ1, . . . , θd).Exercise.Apply the general exponential family result to all the standard familiesdiscussed above such as binomial, Poisson, normal, exponential, gamma.17More Appli cat ions of Exponential Family ExamplesExample. Beta(α, β).A Non-Exponential Family Example.Discrete uniform.P (X = x) = 1/θ, x = 1, . . . , θ, θ is a positive integer.18Universal Cases.X1, ..., Xnare


View Full Document

NCSU ST 522 - Principles of Data Reduction

Documents in this Course
Load more
Download Principles of Data Reduction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Principles of Data Reduction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Principles of Data Reduction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?