UT EE 381K - Research Paper - D384982

Home> Schools> University of Texas at Austin> Electrical Computer (EE) > EE 381K> Research Paper

DOC PREVIEW

UT EE 381K - Research Paper

School name University of Texas at Austin

Course Ee 381k- Analysis & Design of Communication Networks

Pages 9

This preview shows page 1-2-3 out of 9 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

AN ALTERNATING MINIMIZATION ALGORITHMFOR NON-NEGATIVE MATRIX APPROXIMATIONJOEL A. TROPPAbstract. Matrix approximation problems with non-negativity constraints arise duringthe analysis of high-dimensional non-negative data. Applications include dimensionalityreduction, feature extraction and statistical factor analysis. This paper discusses a generalalgorithm for solving these problems with respect to a large class of error measures. Theprimary goal of this article is to develop a rigorous convergence theory for the algorithm.It also touches on some experimental results of S. Sra [1] which indicate the superiority ofthis method to some previously proposed techniques.Date: 8 May 2003.J. Tropp is with The University of Texas, Austin, TX 78712 USA, [email protected] MATRIX APPROXIMATION 21. IntroductionSuppose that A is a d × N matrix whose columns are vectors of non-negative data. Theproblem of extracting r linear features from A can be stated as solving the approximationproblem A ≈ VH , where V is a d × r matrix of feature vectors and H is an r × N matrix ofcoefficients. We quantify the approximation error using a measure that reflects the prove-nance of the data. For example, if we are working with probability distributions, it makessense to use the variational distance or the Kullback-Leibler divergence.A common way to find features is to compute a partial singular value decomposition (SVD)of the data matrix [2]. Although partial SVD provides the best approximation to A in thesense of total least squares, it yields feature vectors that have both positive and negativecomponents. Such features have no interpretation if the data are non-negative. If the dataconsist of pixel intensities from images, what does a negative intensity signify? If the dataconsist of word counts from documents, what does a negative word count mean?To address this metaphysical complaint, several groups of researchers have proposedadding non-negativity constraints to the approximation problem. Paatero and Tapper ad-vocated solving minV ,H ≥0kVH − AkFfor statistical factor analysis [3]. In their first paperon the subject [3], they suggest using an alternating least-squares algorithm [3] to solvethe problem. Later Paatero proposes instead a modified gradient descent [4] and a conju-gate gradient method [5]. Independently, Lee and Seung considered the same optimizationproblem for some applications in machine learning. They originally solved it using an al-ternating projected gradient technique [6]. More recently, they have provided two optimallyscaled gradient descent algorithms which respectively minimize the Frobenius norm and theKullback-Leibler divergence of the approximation error [7]. None of these algorithms havebeen compared in the literature, and proper convergence proofs are scarce.We shall call the problem of approximating A ≈ VH subject to the constraints V , H ≥ 0non-negative matrix approximation (NNMA). The present work discusses an alternating min-imization algorithm for solving NNMA problems where the approximation error is measuredNON-NEGATIVE MATRIX APPROXIMATION 3using a very general type of function. The primary goal is to provide a rigorous account ofthe convergence of this algorithm. I shall also touch on the work of my colleague Suvrit Sra,who has performed some preliminary experiments to compare the actual behavior of thisalgorithm with some of the other proposed methods [1].2. Mathematical Background2.1. Point-to-Set Maps. To understand the convergence of the algorithm, we need someelementary concepts from the theory of point-to-set maps. The power set P(Y ) of a set Yis the collection of all subsets of Y . A point-to-set map Ω from the set X to the set Y is afunction Ω : X −→ P(Y ). The composition of two point-to-set maps Ω1: X −→ P(Y )and Ω2: Y −→ P(Z ) is defined by (Ω2◦ Ω1)(x) = ∪y∈Ω1(x)Ω2(y) [8].Suppose that X and Y are endowed with topologies so that we may speak of convergence.A point-to-set map Ω is closed at bx ∈ X if {xk} ⊂ X , xk−→ bx, yk∈ Ω(xk) and yk−→ bytogether imply that by ∈ Ω(bx). In words, every convergent sequence whose elements lie inthe sets {Ω(xk)} must have its limit in the set Ω(bx). A map is closed on X if it is closed ateach point of X . The composition of two closed maps is always closed [8].We also define two different types of stationary points. A fixed point of the map Ω :X −→ P(X ) is a point x for which {x} = Ω(x). Meanwhile, a generalized fixed point of Ωis a point x for which x ∈ Ω(x) [9].2.2. Iterative Algorithms. Many procedures in mathematical programming can be de-scribed using the language of point-to-set maps. An algorithm is a point-to-set map Ω :X −→ P(X ). Given an initial point x0, an algorithm generates a sequence of points viathe rule xk+1∈ Ω(xk). Now, suppose that J : X −→ R+is a continuous, non-negativefunction. An algorithm Ω is monotonic with respect to J whenever y ∈ Ω(x) implies thatJ(y) ≤ J(x). If, in addition, y ∈ Ω(x) and J(y) = J(x) imply that y = x, then we say thatthe algorithm is strictly monotonic.NON-NEGATIVE MATRIX APPROXIMATION 4Theorem 2.1 (Zangwill [10]). Let Ω be an algorithm that is monotonic with respect to J.Given an initial point x0, suppose that the algorithm generates a sequence {xk} that lies in acompact set. Then the sequence has at least one accumulation point bx, and J(bx) = lim J(xk).Moreover, if Ω is closed at bx then bx is a generalized fixed point of the algorithm.Theorem 2.2 (Meyer [9]). Assume that the algorithm Ω is strictly monotonic with respectto J and that it generates a sequence {xk} which lies in a compact set. If Ω is closed at anaccumulation point bx of {xk} then bx is a (strong) fixed point of Ω. Moreover, if X is normed,kxk+1− xkk −→ 0. It follows that {xk} converges in norm to bx or that the accumulationpoints of {xk} form a continuum.2.3. Infimal Maps. Suppose that f : X × Y −→ R+. Then, we may define the infimalmap M : X −→ P(Y ) by My(x) = arg miny∈Yf(x, y). Define y 7−→ Mx(y) similarly.Theorem 2.3 (Dantzig-Folkman-Shapiro [11]). If f (bx, ·) is continuous on Y , then M isclosed at bx.Theorem 2.4 (Fiorot-Huard [12]). If the infimal maps Mxand Myare both single-valuedthen the algorithm Ωdef= Mx◦ Myis strictly monotonic with respect to f .In Theorem 2.4, the definition of Ω would more properly read Ωdef=

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 9 pages.

UT EE 381K - Research Paper

Sign up for free to view:

Please select your school