MIT 18 466 - Exponential families

Unformatted text preview:

�March 12, 2003 2.5 Exponential families. These will be families {Pθ ,θ ∈ Θ} of laws, including many of the best-known special families such as the binomial and normal laws, and for which there is a natural vector-valued suﬃcient statistic, whose dimension stays constant as the sample size n increases, and which has the Lehmann-Scheﬀ´e property. Deﬁnition. A family P = {Qψ : ψ ∈ Ψ} of laws on a measurable space (X, B), containing at least two diﬀerent laws, is called an exponential family if there exist a σ-ﬁnite measure µ on (X, B), a positive integer k, and real functions θj on Ψ and measurable h with 0 < h(x) < ∞ and Tj on X for j =1,... ,k, such that for all ψ ∈ Ψ, Qψ is absolutely continuous with respect to µ,and forsome C(θ(ψ)) > 0, where θ(ψ):=(θ1(ψ),... ,θk (ψ)), (2.5.1) (dQψ /dµ)(x)= C(θ(ψ))h(x)exp( �kj=1 θj (ψ)Tj (x)). If we replace µ by ν where dν(x)= h(x)dµ(x), the factor h(x) can be omitted, and ν is still a σ-ﬁnite measure. Given the θj ,Tj , h,and µ,the number C(θ(ψ)) is determined kby normalization, so it is, in fact, a function of θ(ψ):= {θj (ψ)}j=1.Thus, given Tj , h, and µ, Qψ is determined by the values of θj (ψ). It follows from the factorization theorem, Corollary 2.1.5, that for any exponential family, the vector-valued statistic (T1(x),... ,Tk (x)) is a suﬃcient statistic. The struc-ture of an exponential family is essentially preserved by taking n i.i.d. observations, as follows. Let {Qψ ,ψ ∈ Ψ} be any exponential family and let X1,... ,Xn be i.i.d. (Qψ ). nThen the distribution Qψ of (X1,... ,Xn) is an exponential family for the σ-ﬁnite mea-n sure µn on Xn, replacing Tj (x)by �Tj (Xi), h(x)byΠjn =1h(Xj ), and C(θ(ψ)) by i=1 C(θ(ψ))n . It follows that for n i.i.d. observations from the exponential family, the k-vector { n Tj (Xi)}k is a suﬃcient statistic. i=1 j=1 Since exponentials are strictly positive, any exponential family is an equivalent family as deﬁned in the last section. The Tj will be called aﬃnely dependent if for some constants c0,c1,... ,ck , not all 0, c0 + c1T1 + ···+ ck Tk = 0 almost everywhere for µ.Then ci =0 for some i ≥ 1, and we can solve for Ti as a linear combination of other Tj and a constant. Then we can eliminate the Ti term and reduce k by 1, adding constants times θi(·)to each θj (·)for j = i. Iterating this, we can assume that T1,... ,Tk are aﬃnely independent, i.e. they are not aﬃnely dependent. Likewise, we can deﬁne aﬃne independence for the functions θj , where now the linear relations among the θj (·) and a constant would hold everywhere rather than almost everywhere (at this point we are not assuming a prior given on the parameter space Ψ). We can eliminate terms until θj (·) are also aﬃnely independent. We will always still have k ≥ 1since P contains at least two laws. Let Θ be the range of the function ψ → θ(ψ):=(θ1(ψ),... ,θk (ψ)) from Ψ into Rk . Then clearly θ1(·),... ,θk (·) are aﬃnely independent if and only if Θ is not included in any (k − 1)-dimensional hyperplane in Rk . Likewise, T1,... ,Tk are aﬃnely independent (as deﬁned above) if and only if for T := (T1,... ,Tk )from X into Rk , the measure µ ◦ T−1 is not concentrated in any (k − 1)-dimensional hyperplane in Rk .For each θ ∈ Θ, let Pθ be the law on X with (dPθ /dµ)(x)= C(θ)h(x)eθ·T (x) where θ · T := �kj=1 θj Tj .Then Qψ = Pθ(ψ) for all ψ ∈ Ψand P = {Pθ : θ ∈ Θ}. A representation (2.5.1) of an exponential family will be called minimal if T1,... ,Tk are aﬃnely independent, as are θ1(·),... ,θk (·). 1� A functionoid is an equivalence class of functions for the relation of almost sure equal-ity for a measure. The well-known Banach spaces Lp of p-integrable functions, such as the Hilbert spaces L2 of square-integrable functions, are actually spaces of functionoids. For an exponential family, or any other equivalent family, almost sure equality is the same for Pθ for all θ. 2.5.2 Theorem. Every exponential family P := {Qψ : ψ ∈ Ψ} has a minimal represen-tation (2.5.1), and then k is uniquely determined. Proof. We already saw that the Tj (·) can be taken to be aﬃnely independent, as can the θj (·), so that the representation (2.5.1) is minimal. As we also saw, the family P can be written as {Pθ ,θ ∈ Θ},Θ ⊂ Rk ,where θ·T (x)(dPθ /dµ)(x)= C(θ)h(x)e . Then the likelihood ratios are all of the form Rθ,φ := RPθ /Pφ = C(θ)C(φ)−1 exp{ �jk =1(θj − φj )Tj (x)}. The logarithms of these likelihood ratios (log likelihood ratios) plus constants span a real vector space VT of functionoids on X, included in the vector space WT of functionoids spanned by 1,T1,... ,Tk .Then WT is (k + 1)-dimensional since T1,... ,Tk are aﬃnely independent by minimality. Also, since θ1,... ,θk are aﬃnely independent on Θ, VT = WT . Now V := VT is determined by the family P, not depending on the choice of µ or T ,so V and k are uniquely deﬁned for the family P. � The number k will be called the order of the exponential family. From here on it will be assumed that the representation of an exponential family is minimal unless it is speciﬁcally said not to be. Any exponential family P can be parameterized by a subset of Rk , replacing θj (ψ)by θj ,withΘ = {θ(ψ): ψ ∈ Ψ},and (2.5.3) (dPθ /dµ)(x)= C(θ)h(x)exp(�kj=1 θj Tj (x)),θ ∈ Θ ⊂ Rk , where now Qψ = Pθ(ψ) for all ψ ∈ Ψ. The parameterization in (2.5.3) is then one-to-one: 2.5.4 Theorem. If an exponential family has a minimal representation (2.5.3), then for = φ in Θ, Pθ any θ = Pφ. Proof. If Pθ = Pφ,thenfor θ · T := j θj Tj ,we havealmosteverywhere θ · T − log C(θ)= φ · T − log C(φ), or (θ − φ) · T = c for some c not depending on x.But θ = φ means that the Tj are aﬃnely dependent, contradicting minimality. � Any subset of an exponential family is also an exponential family with the same Tj and ν, recalling that dν(x):= h(x)dµ(x). It can be useful to take an exponential family as large as possible. Given ν and Tj ,j =1,... ,k the natural parameter space of the exponential family is the set of all θ =(θ1,... ,θk ) ∈ Rk such that 2� (2.5.5) K(θ):= exp( �kj=1 θj Tj

View Full Document Unlocking...