DOC PREVIEW
Annealing and the rate distortion problem

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Annealing and the rate distortion problemAlbert E. ParkerDepartment of Mathematical SciencesMontana State UniversityBozeman, MT [email protected]´aˇs GedeonDepartment of Mathematical SciencesMontana State [email protected] G. DimitrovCenter for Computational BiologyMontana State [email protected] this paper we introduce methodology to determine the bifurcation structure ofoptima for a class of similar cost functions from Rate Distortion Theory, Determin-istic Annealing, Information Distortion and the Information Bottleneck Method.We also introduce a numerical algorithm which uses the explicit form of the bifur-cating branches to find optima at a bifurcation point.1 IntroductionThis paper analyzes a class of optimization problemsmaxq ∈∆G(q) + βD(q) (1)where ∆ is a linear constraint space, G and D are continuous, real valued functions of q,smooth in the interior of ∆, and maxq ∈∆G(q) is known. Furthermore, G and D are invariantunder the group of symmetries SN. The goal is to solve (1) for β = B ∈ [0, ∞).This type of problem, which appears to be NP hard, arises in Rate Distortion Theory [1, 2],Deterministic Annealing [3], Information Distortion [4, 5, 6] and the Information BottleneckMethod [7, 8].The following basic algorithm, various forms of which have appeared in [3, 4, 6, 7, 8], canbe used to solve (1) for β = B.Algorithm 1 Letq0be the maximizer of maxq ∈∆G(q) (2)and let β0= 0. For k ≥ 0, let (qk, βk) be a solution to (1). Iterate the following steps untilβκ= B for some κ.1. Perform β-step: Let βk+1= βk+ dkwhere dk> 0.2. Take q(0)k+1= qk+ η, where η is a small perturbation, as an initial guess for thesolution qk+1at βk+1.3. Optimization: solvemaxq ∈∆G(q) + βk+1D(q)to get the maximizer qk+1, using initial guess q(0)k+1.We introduce methodology to efficiently perform algorithm 1. Specifically, we implementnumerical continuation techniques [9, 10] to effect steps 1 and 2. We show how to detectbifurcation and we rely on bifurcation theory with symmetries [11, 12, 13] to search for thedesired solution branch. This paper concludes with the improved algorithm 6 which solves(1).2 The cost functionsThe four problems we analyze are from Rate Distortion Theory [1, 2], Deterministic Anneal-ing [3], Information Distortion [4, 5, 6] and the Information Bottleneck Method [7, 8]. Wediscuss the explicit form of the cost function (i.e. G(q) and D(q)) for each of these scenariosin this section.2.1 The distortion function D(q)Rate distortion theory is the information theoretic approach to the study of optimal sourcecoding systems, including systems for quantization and data compression [2]. To define howwell a source, the random variable Y , is represented by a particular representation using Nsymbols, which we call YN, one introduces a distortion function between Y and YND(q(yN|y)) = D(Y, YN) = Ey ,yNd(y, yN) =XyXyNq(yN|y)p(y)d(y, yN)where d(y, yN) is the pointwise distortion function on the individual elements of y ∈ Y andyN∈ YN. q(yN|y) is a stochastic map or quantization of Y into a representation YN[1, 2].The constraint space∆ := {q(yN|y) |XyNq(yN|y) = 1 and q(yN|y) ≥ 0 ∀y ∈ Y } (3)(compare with (1)) is the space of valid quantizers in <n. A representation YNis optimal ifthere is a quantizer q∗(yN|y) such that D(q∗) = minq ∈∆D(q).In engineering and imaging applications, the distortion function is usually chosen as the meansquared error [1, 3, 14],ˆD(Y, YN) = Ey ,yNˆd(y, yN), where the pointwise distortion func-tionˆd(y, yN) is the Euclidean squared distance. In this case,ˆD(Y, YN) is a linear functionof the quantizer. In [4, 5, 6], the information distortion measureDI(Y, YN) :=Xy ,yNp(y, yN)KL(p(x|yN)||p(x|y)) = I(X; Y ) − I(X; YN)is used, where the Kullback-Leibler divergence KL is the pointwise distortion function. Un-like the pointwise distortion functions usually investigated in information theory [1, 3], thisone is nonlinear, it explicitly considers a third space, X, of inputs, and it depends on thequantizer q(yN|y) through p(x|yN) =Pyp(x|y)q (yN|y) p( y)p(yN). The only term in DIwhichdepends on the quantizer is I(X; YN), so we can replace DIwith the effective distortionDeff(q) := I(X; YN).Deff(q) is the function D(q) from (1) which has been considered in [4, 5, 6, 7, 8].2.2 Rate DistortionThere are two related methods used to analyze communication systems at a distortion D(q) ≤D0for some given D0≥ 0 [1, 2, 3]. In rate distortion theory [1, 2], the problem of finding aminimum rate at a given distortion is posed as a minimal information rate distortion problem:R(D0) =minq (yN|y)∈∆I(Y ; YN)D(Y ; YN) ≤ D0. (4)This formulation is justified by the Rate Distortion Theorem [1]. A similar exposition usingthe Deterministic Annealing approach [3] is a maximal entropy problemmaxq (yN|y) ∈∆H(YN|Y )D(Y ; YN) ≤ D0. (5)The justification for using (5) is Jayne’s maximum entropy principle [15]. These formulationsare related since I(Y ; YN) = H(YN) − H(YN|Y ).Let I0> 0 be some given information rate. In [4, 6], the neural coding problem is formulatedas an entropy problem as in (5)maxq (yN|y) ∈∆H(YN|Y )Deff(q) ≥ I0. (6)which uses the nonlinear effective information distortion measure Deff.Tishby et. al. [7, 8] use the information distortion measure to pose an information ratedistortion problem as in (4)minq (yN|y)∈∆I(Y ; YN)Deff(q) ≥ I0. (7)Using the method of Lagrange multipliers, the rate distortion problems (4),(5),(6),(7) can bereformulated as finding the maxima ofmaxq ∈∆F (q, β) = maxq ∈∆[G(q) + βD(q)] (8)as in (1) where β = B. For the maximal entropy problem (6),F (q, β) = H(YN|Y ) + βDeff(q) (9)and so G(q) from (1) is the conditional entropy H(YN|Y ). For the minimal information ratedistortion problem (7),F (q, β) = −I(Y ; YN) + βDeff(q) (10)and so G(q) = −I(Y ; YN).In [3, 4, 6], one explicitly considers B = ∞. For (9), this involves takinglimβ→∞maxq ∈∆F (q, β) = maxq ∈∆Deff(q) which in turn gives minq (yN|y) ∈∆DI. InRate Distortion Theory and the Information Bottleneck Method, one could be interested insolutions to (8) for finite B which takes into account a tradeoff between I(Y ; YN) and Deff.For lack of space, here we consider (9) and (10). Our analysis extends easily to similarformulations which use a norm based distortion such asˆD(q), as in [3].3 Improving the algorithmWe now turn our


Annealing and the rate distortion problem

Download Annealing and the rate distortion problem
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Annealing and the rate distortion problem and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Annealing and the rate distortion problem 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?