March 12 2003 2 5 Exponential families These will be families P of laws including many of the best known special families such as the binomial and normal laws and for which there is a natural vector valued su cient statistic whose dimension stays constant as the sample size n increases and which has the Lehmann Sche e property De nition A family P Q of laws on a measurable space X B containing at least two di erent laws is called an exponential family if there exist a nite measure on X B a positive integer k and real functions j on and measurable h with 0 h x and Tj on X for j 1 k such that for all Q is absolutely continuous with respect to and for some C 0 where 1 k k 2 5 1 dQ d x C h x exp j 1 j Tj x If we replace by where d x h x d x the factor h x can be omitted and is still a nite measure Given the j Tj h and the number C is determined by normalization so it is in fact a function of j kj 1 Thus given Tj h and Q is determined by the values of j It follows from the factorization theorem Corollary 2 1 5 that for any exponential family the vector valued statistic T1 x Tk x is a su cient statistic The structure of an exponential family is essentially preserved by taking n i i d observations as follows Let Q be any exponential family and let X1 Xn be i i d Q Then the distribution Qn of X1 Xn is an exponential family for the nite mea n n sure n on X n replacing Tj x by i 1 Tj Xi h x by j 1 h Xj and C by n It follows that for n i i d observations from the exponential family the k vector C n k is a su cient statistic i 1 Tj Xi j 1 Since exponentials are strictly positive any exponential family is an equivalent family as de ned in the last section The Tj will be called a nely dependent if for some constants c0 c1 ck not all 0 c0 c1 T1 ck Tk 0 almost everywhere for Then ci 0 for some i 1 and we can solve for Ti as a linear combination of other Tj and a constant Then we can eliminate the Ti term and reduce k by 1 adding constants times i to each j for j i Iterating this we can assume that T1 Tk are a nely independent i e they are not a nely dependent Likewise we can de ne a ne independence for the functions j where now the linear relations among the j and a constant would hold everywhere rather than almost everywhere at this point we are not assuming a prior given on the parameter space We can eliminate terms until j are also a nely independent We will always still have k 1 since P contains at least two laws Let be the range of the function 1 k from into Rk Then clearly 1 k are a nely independent if and only if is not included in any k 1 dimensional hyperplane in Rk Likewise T1 Tk are a nely independent as de ned above if and only if for T T1 Tk from X into Rk the measure T 1 is not concentrated in any k 1 dimensional hyperplane in Rk For each let P k be the law on X with dP d x C h x e T x where T j 1 j Tj Then Q P for all and P P A representation 2 5 1 of an exponential family will be called minimal if T1 Tk are a nely independent as are 1 k 1 A functionoid is an equivalence class of functions for the relation of almost sure equality for a measure The well known Banach spaces Lp of p integrable functions such as the Hilbert spaces L2 of square integrable functions are actually spaces of functionoids For an exponential family or any other equivalent family almost sure equality is the same for P for all 2 5 2 Theorem Every exponential family P Q has a minimal representation 2 5 1 and then k is uniquely determined Proof We already saw that the Tj can be taken to be a nely independent as can the j so that the representation 2 5 1 is minimal As we also saw the family P can be written as P Rk where dP d x C h x e T x Then the likelihood ratios are all of the form R RP P C C 1 exp k j 1 j j Tj x The logarithms of these likelihood ratios log likelihood ratios plus constants span a real vector space VT of functionoids on X included in the vector space WT of functionoids spanned by 1 T1 Tk Then WT is k 1 dimensional since T1 Tk are a nely independent by minimality Also since 1 k are a nely independent on VT WT Now V VT is determined by the family P not depending on the choice of or T so V and k are uniquely de ned for the family P The number k will be called the order of the exponential family From here on it will be assumed that the representation of an exponential family is minimal unless it is speci cally said not to be Any exponential family P can be parameterized by a subset of Rk replacing j by j with and k 2 5 3 dP d x C h x exp j 1 j Tj x Rk where now Q P for all The parameterization in 2 5 3 is then one to one 2 5 4 Theorem If an exponential family has a minimal representation 2 5 3 then for any in P P Proof If P P then for T j j Tj we have almost everywhere T log C T log C or T c for some c not depending on x But means that the Tj are a nely dependent contradicting minimality Any subset of an exponential family is also an exponential family with the same Tj and recalling that d x h x d x It can be useful to take an exponential family as large as possible Given and Tj j 1 k the natural parameter space of the exponential family is the set of all 1 k Rk such that 2 2 5 5 K k exp j 1 j Tj x d x Clearly K 0 for all For any in the natural parameter space we can de ne C 1 K and get a probability measure P given by 2 5 3 So we have a family of laws P indexed by the natural parameter space The family doesn t extend to values of outside the natural parameter space since then normalization is not possible 2 5 6 Theorem For any …
View Full Document