DOC PREVIEW
Berkeley COMPSCI C267 - Parallel RI-MP2

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Parallel RI-MP2Matt GoldeyMøller–Plesset perturbationtheoryCommon correction for energies obtainedby the Hartree-Fock Self-ConsistentField methodComputational costs are significant forlarge systems, involving polynomialscaling, depending on approximationsusedMeritInclusion of electron correlationImages from Wikipedia<http://en.wikipedia.org/wiki/M%C3%B8ller%E2%80%93Plesset_perturbation_theory>Contributions represented as (ia|jb)Formed via integral transformations fromSCF calculationsRI approximationInstead of directly forming (ia|jb), (ia|X) isformed for the auxiliary basis X (alsodenoted P,Q)(P|Q)-1/2 used to form Bia=(ia|P)(P|Q)-1/2Largest costing step is multiplying(ia|jb)=Bia*Bjb for all a,b, auxiliaryfunctionsWorkloadNeed to span virtual space for each occupied pair (through alloccupied space)Simplest implementation:Loop iRead Bia ∀a,QLoop jRead Bjb ∀b,QForm (ia|jb)=Bia*(Bjb)TCalculate Energy contributionsNocc*Nocc matrix multiplications performedNocc + Nocc*Nocc B matrices read from hard drive -- ~Nocc2*Nvirt*X inhard drive readsNeed to span virtual space for each occupied pair (through alloccupied space)Simplest implementation:Loop iRead Bia ∀a,QLoop jRead Bjb ∀b,QForm (ia|jb)=Bia*(Bjb)TCalculate Energy contributionsNocc*Nocc matrix multiplications performedNocc + Nocc*Nocc B matrices read from hard drive -- ~Nocc2*Nvirt*X inhard drive readsAlgorithm OptimizationCurrent description of serial algorithm not complete -- only need tospan unique pairs of i,j --the upper or lower triangular portion ofthe occupied spaceLoop iRead Bia ∀a,QLoop j≥iRead Bjb ∀b,QForm (ia|jb)=Bia*(Bjb)TCalculate weighted energy contributionsNocc*(Nocc+1)/2 matrix multiplications performed -- computation cutby 1/2Nocc + Nocc*(Nocc+1)/2 B matrices read from hard drive --~Nocc(Nocc+1)/2*Nvirt*X in hard drive readsEven this is not optimized since reading in Bjb is unnecessary forj==iFurther, given that these matrix multiplications are relatively small,want to distribute work among processorsMemory ConsiderationsBia & Bjb are (Nvirt * X) long and are read in from hard drive asneededSerially, 2*(Nvirt*X)+Nvirt*Nvirt stored in memoryMemory delimited by earlier step to 3*X2Given Nvirt~N, X~4*N,Memory considerations for largest possible systems are3*4*4*N2=48*N2Memory occupied using this serial implementation for nprocessor shared memory system isnproc*(2*(Nvirt*X)+Nvirt*Nvirt)~nproc*(9*n2)For 8 processors, this amounts to 72*N2 in memory at a time,limiting system size unnecessarily -- need a different algorithmfor parallel computation for a work-distributed systemMemory-guided WorkDistributionNeed to minimize memory usage as wellas minimize number of reads from thehard drive -- which is expected tobecome dominant in the large systemlimitHave to satisfy both criteria in workdistribution schemeMemory-guided WorkDistributionSolutions:Hard drive read batching -- using the space availablePivoting -- using what is in memory to direct the next batchLoop over batches (determined by memory)Read Bia ∀i ∈(first batch-1 elements of batch)Loop over Nocc∉batch (Nocc>batch)Form all possible (ia|jb), including i==j (ia|ib)This provides a solution to the serial bottleneck of hard drive readsReduces the number of matrices read from ~Nocc2 (initialimplementation)New hard drive I/O costs:Nreads=Nocc+(Nocc-(batchsize-1)-1)+(Nocc-2*(batchsize-1)-1)+…Nreads=Nocc(Nbatches+1)-Nbatches-(batchsize-1)(Nbatches(Nbatches+1)/2)Nreads~Nocc*Nbatches-(batchsize-1)Nbatches2/2Nreads~Nocc2/(batchsize-1)-Nocc2/(2*(batchsize-1))~=Nocc2/(2*(batchsize-1))This solves the hard drive read problem in serial, but does it work forparallel?X X X X X X X X X XX X X X X X X X XX X X X X X X XXXXXXXXX X X X X X X X X XX X X X X X X X XX X X X X X X XX X X X X X XX X X X X XX X X X XXXXXX X X X X X X X X XX X X X X X X X XX X X X X X X XX X X X X X XX X X X X XX X X X XX X X XX XXBatch size limitationsTypical system sizes constrain the batch size toa maximum of ~9-11 B matrices in memory atonceUsing 9 B matrices, ignoring edge cases, 8computations must be performed at eachstep, with each computation of equivalentcostHowever, this method requires synchronizationafter each matrix multiplication -- a potentialbottleneck for anisotropic systemsDistributed Memory, theFutureFurther modeling is needed for extendingthis method to distributed memorysystems since batches will not haveequivalent workloads -- the occupiedspace needing to be spanned isdiminished by each


View Full Document

Berkeley COMPSCI C267 - Parallel RI-MP2

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Parallel RI-MP2
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Parallel RI-MP2 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Parallel RI-MP2 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?