Chapter 6: Principles of Data Reduction1 Statistical InferenceData X = (X1, . . . , Xn): from a probability distribution f(x|θ), with θunknown.• Our task is to estimate θ based on data.Examples:– to estimate the success probability p in a Bernoulli trial– to estimate the supporting rate p of a president candidate– to estimate the average SAT score of the freshmen at a national level• Three types of methods to estimate θ– point estimation (Chapter 7)– hypothesis testing (Chapter 8)– interval estimation (Chapter 9)Two Steps for Statistical InferenceStep 1 Data reductionsummarizing information abou t θ in data with one or a few statisticsT = T (X)• Data X1, ..., Xncontains much information, some are relevant for θand some are not.• Dropping ir relevant information is desirable, but dr opping relevantinformation is undesirable.• the dimen s ion of T is generally smaller than the sample size n.Step 2 Estimator constructionusing T to construct point estimators, test statistics, upper/lower con-fidence limit.82 Statistics and PartitionDef.: A statistic T (X) is a function of the sample X1, · · · , Xn. Examples:• sample mean¯X, sample variance S2• the largest order statistic X(n), the sm allest order statistic X(1)Partition of Sample Space by T (X)Consider the discrete case. For any possible value t of T, there is acorresponding setAt= {x : T (x) = t}.The set collection {At, all t} makes a partition on the sample s pace of X.NoteP (T (X) = t) =Xx∈AtP (X = x).The event {X = x} is the s ubset of {T (X) = T(x)}, i.e.,{ω|X(ω) = x} ⊂ {ω|T (X(ω)) = T (x)}.Example. Toss a coin n = 3 times, and let X1, · · · , X3be respectivelythe outcome of each toss. Let T = the total number of heads obtained, i.e.,T =P3i=1Xi. Write down the partition of the sample space given by T .Remark: Often T has a simpler data structure and d istrib ution than theoriginal sample X = (X1, · · · , Xn), so it would be nice if we can use T (X)to s ummarize and then rep lace the entire data.9Important Issues in Data Reduction:We should think about the following questions carefully before the “sim-plification” process:• Is there any loss of information due to summarization?• How to compare the amount of information about θ in the originaldata X and in T (X)?• Is it sufficient to consider only the “reduced data” T ?3 Sufficient S tatisticsA statistic T is called sufficient if the conditional distribution of X given Tis free of θ (that is, the conditional is a completely known distribution).Remark 1:• A distrib ution free of θ means that the distribution is completelyknown, hence the correspon ding ran dom quantity can be generated witha random number generator.Example. Toss a coin n times, and the probability of head is an u nknownparameter p. Let T = the total number of heads. Is T sufficient for p?10Remark 2:• Consid er the discrete random variable case. Assume T is sufficient forθ. Given any value of T , we can define a conditional distribution of X givenon T (X) = t (with the restricted sample space At), and generate a pseudodata X′by the random number generatorP (X′= x|T (X) = T (x)) = P (X = x|T (X) = T (x)). (∗)Note (*) is actually a probability distribution defined on the set AT (x). Both{X = x} and {X′= x} are subsets of {T (X) = T (x)}. We can show thatX and X′have the identical distribution, i.e.Pθ(X = x) = Pθ(X′= x), ∀x, θ.• Since X′1, ..., X′ncan be regarded another random sample from the samepopulation as the original data X1, · · · , Xn, they contain an equal amountof probabilistic information about θ. T herefore, we can “recover the data”if we retain T and discard X1, · · · , Xn. That’s why T is “sufficient”.• For the continuous case, we have the same conclusion (proof omittedfor technical reasons).Sufficiency PrincipleIf T is sufficient, the “extra information” carried by X is worthless aslong as θ is concerned. It is then only natural to consider inference proce-dures which do not use this extra irr elevant information. This leads to theSufficiency Principle :Any inference procedure should depend on the data only through sufficientstatistics.11Conditional Probability (discrete case):For any x and t, we havePθ(X = x|T (X) = t) =(Pθ(X=x,T (X)=t)Pθ(T (X)=t)=Pθ(X=x)Pθ(T (X)=t)if T (x) = t,0 if T (x) 6= t,so we only need to compute Pθ(X = x|T (X) = T (x)).How to check sufficiency?Let the distribution of data X is p(x; θ) and the distribu tion of T beq(t; θ). The t-th conditional is p(x; θ)/q(t; θ) on At. This should be free ofθ (but may depend on x) for all t, if T is sufficient for θ. In other words, ifp(x; θ)/q(T (x); θ) is free of θ, (may depend on x) (∗)for all x and θ, then T is a sufficient statistic for θ.Example. X1, . . . , Xniid N (θ, 1). T =¯X.Remarks: (*) is still not very convenient to apply.• Need to guess the form of a sufficient statistic.• Need to figure out the distribution of T .12How to find a sufficient statistic?(Neyman-Fisher) Factorization theorem.T is sufficient if and only if p(x; θ) can be written as the productg(T (x); θ)h(x), w here the first factor depends on x only though T (x) andthe second factor is free of θ.Proof. (for discrete case only)13Example. Binomial. iid bin(1, θ)Example. Poisson. iid Poi(θ).Example. Exp(β).14Example. Normal. iid N (θ, 1).When the range of X depends on θ, should be more careful aboutfactorization. Must use indicator functions explicitly.Example. Uniform. iid U (0, θ).15Two-dimensional Examples.Normal. iid N(µ, σ2). θ = (µ, σ2) (both unknown).Gamma. iid Ga(α, β). θ = (α, β).16All the examples above but the last one are special cases of a general resultfor the exponential family.Exponential Family: Recall the density function of an exponential familyf(x; θ) = c(θ)h(x) exp[kXj=1wj(θ)tj(x)], θ = (θ1, . . . , θd)(full exponential family if d ≤ k; curved family if d < k).Theorem. Let X1, . . . , Xnbe a random sample from the exponential family.ThenT (X) = (nXi=1t1(Xi), . . . ,nXi=1tk(Xi))is sufficient for θ = (θ1, . . . , θd).Exercise.Apply the general exponential family result to all the standard familiesdiscussed above such as binomial, Poisson, normal, exponential, gamma.17More Appli cat ions of Exponential Family ExamplesExample. Beta(α, β).A Non-Exponential Family Example.Discrete uniform.P (X = x) = 1/θ, x = 1, . . . , θ, θ is a positive integer.18Universal Cases.X1, ..., Xnare
View Full Document