DOC PREVIEW
LSU CSC 4351 - Memory-optimal evaluation of expression trees involving large objects

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Memory optimal evaluation of expression trees involving large objects Chi Chung Lama 1 Thomas Rauberb Gerald Baumgartnerc Daniel Cociorvaa 2 P Sadayappana a Department of Computer Science and Engineering The Ohio State University Columbus OH 43210 USA of Bayreuth Lehrstuhl fu r Angewandte Informatik II D 95447 Bayreuth Germany c Department of Computer Science Louisiana State University Baton Rouge LA 70803 USA b University Abstract The need to evaluate expression trees involving large objects arises in scientific computing applications such as electronic structure calculations Often the tree node objects are so large that only a subset of them can fit into memory at a time This paper addresses the problem of finding an evaluation order of the nodes in a given expression tree that uses the least amount of memory We present an algorithm that finds an optimal evaluation order in n log2 n time for an n node expression tree and prove its correctness We demonstrate the utility of our algorithm using representative equations from quantum chemistry Keywords expression tree evaluation order memory minimization register allocation 1 Introduction This paper addresses the problem of finding an evaluation order of the nodes in a given expression tree that minimizes memory usage The expression tree must be evaluated in some bottom up order i e the evaluation of a node cannot precede the evaluation of any of its children The nodes of the expression tree are large data objects whose sizes are given If the total size of the data objects is so large that they cannot all fit into memory at the same time space for the data objects has to be allocated and deallocated dynamically Due to the parent child dependence relation a data object cannot be deallocated until its parent node data object has been evaluated The objective is to minimize the maximum memory usage during the evaluation of the entire expression tree This problem arises for example in the accurate modeling of the electronic structure of atoms and molecules in quantum chemistry 1 2 as well as in some computational physics codes modeling the electronic properties of semiconductors and metals 3 4 5 Computational approaches to modeling the structure and interactions of molecules the electronic and optical This work was supported in part by the National Science Foundation under grants DMR 9520319 CHE 0121676 CNS 0509467 and CCF 0541409 Corresponding author Tel 1 225 578 2191 fax 1 225 578 1465 1 Current address Chemical Abstracts Service Columbus OH 43210 USA 2 Current address Department of Chemical Physiology The Scripps Research Institute La Jolla CA 92037 USA Preprint submitted to Elsevier April 28 2010 properties of molecules the heat and rate of chemical reactions etc are crucial to the understanding of chemical processes in real world systems Examples of applications include combustion and atmospheric chemistry chemical vapor deposition protein structure and enzymatic chemistry and industrial chemical processing The computational domain that we consider is also extremely compute intensive and consumes significant computer resources at national supercomputer centers Many of these codes are limited in the size of the problem that they can currently solve because of memory and performance limitations In this class of computations the final result to be computed can be expressed in terms of tensor contractions essentially a collection of multi dimensional summations of the product of several input arrays Due to commutativity associativity and distributivity there are many different ways to compute the final result and they could differ widely in the number of floating point operations required Consider the following expression X Aacik Bbe f l Cd f jk Dcdel S abi j cde f kl where typical index ranges are on the order of tens to a few thousands If this expression is directly translated into code with ten nested loops for indices a l the total number of arithmetic operations required will be 4 N 10 if the range of each index a l is N Instead the same expression can be rewritten by use of associative and distributive laws 6 7 8 X X X C A B D S abi j d f jk acik be f l cdel ck df el This corresponds to the formula sequence shown in Fig 1 a and can be directly translated into code as shown in Fig 1 b This form only requires 6 N 6 operations However additional space is required to store temporary arrays T 1 and T 2 Often the space requirements for the temporary arrays poses a serious problem For this example abstracted from a quantum chemistry model the array extents along indices a d are the largest while the extents along indices i l are the smallest Therefore the size of temporary array T 1 would dominate the total memory requirement Thus although the latter form is far more economical in terms of the number of operations its implementation will require the use of temporary intermediate arrays to hold the partial results of the parenthesized array subexpressions One approach to reducing the memory requirements for the computation is through loop fusion By merging the common outer loops of the producer and consumer loop nests for an intermediate array the dimensions corresponding to the fused loops can be eliminated from the intermediate array In our example loop fusion allows T 1 to be reduced to a scalar and T 2 to a 2 dimensional array without changing the number of operations as illustrated in Fig 1 c Since different fusion choices are often not mutually compatible it is necessary to enumerate all fusion choices to find the loop structure that minimizes the memory requirements 9 10 11 If even after loop fusion some intermediates do not fit in memory it is necessary to tile these intermediates and move tiles in and out of disk 12 13 While loop fusion usually results in large memory reductions in the context of tensor contractions it has a detrimental effect it reduces temporal and spatial locality As a result the computation can become significantly more expensive E g while the tensor contractions in Fig 1 b can be implemented using index permutations and BLAS matrix multiplications which use the cache effectively this is not possible anymore with the fused code in Fig 1 c 2 T 1bcd f X Bbe f l Dcdel T 2bc jk X T 1bcd f Cd f jk X T 2bc jk Aacik T1 0 T2 0 S 0 for b c d e f l h T1bcdf Bbefl Dcdel for b c d f j k h T2bcjk T1bcdf Cdfjk for a b c i j k h Sabij T2bcjk Aacik el df S abi j ck a Formula sequence b Direct implementation unfused code S


View Full Document

LSU CSC 4351 - Memory-optimal evaluation of expression trees involving large objects

Download Memory-optimal evaluation of expression trees involving large objects
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Memory-optimal evaluation of expression trees involving large objects and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Memory-optimal evaluation of expression trees involving large objects 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?