MIT 6 454 - Relationship Between Information Inequalities and Group Theory - D652680

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 454> Relationship Between Information Inequalities and Group Theory

DOC PREVIEW

MIT 6 454 - Relationship Between Information Inequalities and Group Theory

School name Massachusetts Institute of Technology

Course 6 454- Graduate Seminar in Area I

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

A Relationship Between Information Inequalitiesand Group TheoryDesmond S. Lun23 October 20021 OutlineThis report presents the result recently published in [1] that establishes a one-to-one correspondence between information inequalities and group inequalities.Our aim in this report is to present this result in as concise a manner as possiblewhilst not excluding any steps in the derivation, thus making it suitable for briefperusal. In Sections 2–4, we introduce the notions of entropy functions, group-characterizable functions, information inequalities, and group inequalities. Weconﬁne most of the technical details to Section 5 — a section that may weskipped if one is interested only in the general idea of the method in whichthe result is demonstrated. We complete the demonstration in Section 6 andconclude in Section 7.2 Entropy functionsLet X1,X2,... ,Xnbe n discrete random variables whose sample spaces areX1, X2,... ,Xnrespectively. Let N = {1,... ,n} and Ω be the collection of allnon-empty subsets of N ,so|Ω| =2n−1. For α ∈ Ω, we deﬁne Xαto be the jointrandom variable (Xi)i∈αwith sample space Xα=i∈αXi(i.e. the Cartesianproduct of Xifor i ∈ α). For example, suppose α = {1, 2}; then X{1,2}is therandom pair (X1,X2).For a given set of random variables X1,X2,... ,Xn, we can deﬁne a realfunction over Ω. Since |Ω| is ﬁnite, such a function is completely described bythe set of values it takes on Ω, i.e. by a point in R|Ω|. An entropy function,therefore, is deﬁned a follows.Deﬁnition 1 Let g be a vector in R|Ω|with components gαindexed by α ∈Ω.Theng is an entropy function if there exists a set of random variablesX1,X2,... ,Xnsuch that gα= H(Xα) for all α.Let Γ∗nbe the set of all entropy functions associated with n random variables;that isΓ∗n= {g ∈ R|Ω|| g is an entropy function}. (1)13 Group-characterizable functionsWe now deﬁne functions over Ω that can be characterized by a ﬁnite group Gand subgroups G1,G2,... ,Gnin a particular way. Our ultimate aim will be torelate such group-characterizable functions with entropy functions. For a givenα ∈ Ω, we denote the intersection of the subgroups Gifor i ∈ α by Gα,whichis also a subgroup a G.Deﬁnition 2 Let h be a vector in R|Ω|with components hαindexed by α ∈ Ω.Then h is called group-characterizable if there exist subgroups G1,G2,... ,Gnof a group G such that hα= log(|G|/|Gα|) for all α.Let Υnbe the set of all group-characterizable functions associated with ngroups; that isΥn= {h ∈ R|Ω|| h is group-characterizable}. (2)4 Information inequalities and group inequali-tiesIt was noted in [3] that an information inequalityα∈ΩbαH(Xα) ≥ 0 (3)is valid if and only ifΓ∗n⊂{h ∈ R|Ω|| bh ≥ 0}, (4)where b is the column vector with components bαindexed by α ∈ Ω. Thisobservation, which follows immediately from the deﬁnition of Γ∗n, highlightsthe importance of characterizing Γ∗nto the study of information inequalities, orequivalently its closureΓ∗n, since because {h ∈ R|Ω|| bh ≥ 0} is closed, thecondition (4) is equivalent toΓ∗n⊂{h ∈ R|Ω|| bh ≥ 0}. (5)The characterization Γ∗nor its closure appears to be a highly non-trivial problemin general [6].Likewise, a group inequality of the formα∈Ωbαlog|G||Gα|≥ 0 (6)is valid if and only ifΥn⊂{h ∈ R|Ω|| bh ≥ 0}. (7)2Thus, we are interested in the relationship between Υnand Γ∗n. We observethat all the function values of group-characterizable functions are in one-to-onecorrespondence with rational numbers, so there are at most countably manygroup-characterizable functions. There are, however, uncountably many entropyfunctions. Hence, there are far fewer group-characterizable functions than thereare entropy functions. It turns out, however, that Υnis almost good enough tocharacterize Γ∗n, and in fact conv(Υn), the convex closure of Υn, is equal to Γ∗n,the closure of Γ∗n, as we shall establish in the next section.5 The equivalence of conv(Υn) and Γ∗nTheorem 1conv(Υn)=Γ∗nfor all natural numbers n.To establish this theorem, we require a number of lemmas.Lemma 1 [1, Theorem 3.1] If h is group-characterizable, then it is an entropyfunction, i.e. h ∈ Γ∗n.Proof. Let Λ be a uniformly-distributed discrete random variable with samplespace G. For any i ∈N, let random variable Xibe a function of Λ such thatXiis the left coset aGiif Λ = a.Letα be an element of Ω. ThenP ((Xi= aiGi)i∈α)=P (Λ ∈∩i∈αaiGi)=|∩i∈αaiGi||G|. (8)Consider ∩i∈αaiGi. If it is non-empty, then let b be an element of ∩i∈αaiGi,and we have∩i∈αaiGi= ∩i∈αbGi= b ∩i∈αGi= bGα.(9)Therefore, the intersection ∩i∈αaiGiis either empty or of size |Gα|.Moreover,by Lagrange’s theorem, there are |G|/|Gα| distinct such non-empty intersections.Hence, we haveP ((Xi= aiGi)i∈α)= |Gα||G|if ∩i∈αaiGi= ∅,0 otherwise.(10)And it is evident that the entropy of (Xi)i∈αis log(|G|/|Gα|)=hα. Therefore,h is an entropy function. Lemma 2 [5, Theorem 1]Γ∗nis a convex cone.Proof. We shall ﬁrst demonstrate thatΓ∗nis convex. Let u, v be the entropyfunctions of two sets of random variables Y1,Y2,... ,Ynand Z1,Z2,... ,Znrespectively. Thus u, v ∈ Γ∗n. It suﬃces to show that for any 0 <b<1,3bu +(1− b)v ∈ Γ∗n(for it is then straightforward to extend the result to anyconvex combination of points in the closure of Γ∗n).Let (Y(i)1,Y(i)2,... ,Y(i)n)for1≤ i ≤ k be k independent vectors each dis-tributed identically to (Y1,Y2,... ,Yn). Likewise, let (Z(i)1,Z(i)2,... ,Z(i)n)for1 ≤ i ≤ k be k independent vectors each distributed identically to (Z1,Z2,... ,Zn).Let U be a random variable independent of all other random variables havingthe distributionpU(u)=1 − δ − µ if u =0,δ if u =1,µ if u =2.(11)Observe that H(U ) → 0asδ, µ → 0.We now construct the random variables X1,X2,... ,XnbyXi=0ifU =0,(Y(1)i,Y(2)i,... ,Y(k)i)ifU =1,(Z(1)i,Z(2)i,... ,Z(k)i)ifU =2.(12)So for any α ∈ Ω, we haveH(Xα) ≤ H(Xα,U)=H(U)+H(Xα|U)= H(U)+δkH (Yα)+µkH(Xα)(13)andH(Xα) ≥ H(Xα|U)=H(U)+δkH (Yα)+µkH(Xα). (14)Combining the two previous equations, we have0 ≤ H(Xα) − (δkH (Yα)+µkH(Zα)) ≤ H(U ). (15)And by taking δ = b/k and µ =(1− b)/k, we obtain0 ≤ H(Xα) − (bH(Yα)+(1− b)H(Zα)) ≤ H(U ). (16)By taking k suﬃciently large, the

View Full Document