Berkeley COMPSCI C267 - Parallel RI-MP2 - D391385

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI C267> Parallel RI-MP2

DOC PREVIEW

Berkeley COMPSCI C267 - Parallel RI-MP2

School name University of California, Berkeley

Course Compsci C267- Applications of Parallel Computers

Pages 11

This preview shows page 1-2-3-4 out of 11 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 11 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Parallel RI-MP2Matt GoldeyMøller–Plesset perturbationtheoryCommon correction for energies obtainedby the Hartree-Fock Self-ConsistentField methodComputational costs are significant forlarge systems, involving polynomialscaling, depending on approximationsusedMeritInclusion of electron correlationImages from Wikipedia<http://en.wikipedia.org/wiki/M%C3%B8ller%E2%80%93Plesset_perturbation_theory>Contributions represented as (ia|jb)Formed via integral transformations fromSCF calculationsRI approximationInstead of directly forming (ia|jb), (ia|X) isformed for the auxiliary basis X (alsodenoted P,Q)(P|Q)-1/2 used to form Bia=(ia|P)(P|Q)-1/2Largest costing step is multiplying(ia|jb)=Bia*Bjb for all a,b, auxiliaryfunctionsWorkloadNeed to span virtual space for each occupied pair (through alloccupied space)Simplest implementation:Loop iRead Bia ∀a,QLoop jRead Bjb ∀b,QForm (ia|jb)=Bia*(Bjb)TCalculate Energy contributionsNocc*Nocc matrix multiplications performedNocc + Nocc*Nocc B matrices read from hard drive -- ~Nocc2*Nvirt*X inhard drive readsNeed to span virtual space for each occupied pair (through alloccupied space)Simplest implementation:Loop iRead Bia ∀a,QLoop jRead Bjb ∀b,QForm (ia|jb)=Bia*(Bjb)TCalculate Energy contributionsNocc*Nocc matrix multiplications performedNocc + Nocc*Nocc B matrices read from hard drive -- ~Nocc2*Nvirt*X inhard drive readsAlgorithm OptimizationCurrent description of serial algorithm not complete -- only need tospan unique pairs of i,j --the upper or lower triangular portion ofthe occupied spaceLoop iRead Bia ∀a,QLoop j≥iRead Bjb ∀b,QForm (ia|jb)=Bia*(Bjb)TCalculate weighted energy contributionsNocc*(Nocc+1)/2 matrix multiplications performed -- computation cutby 1/2Nocc + Nocc*(Nocc+1)/2 B matrices read from hard drive --~Nocc(Nocc+1)/2*Nvirt*X in hard drive readsEven this is not optimized since reading in Bjb is unnecessary forj==iFurther, given that these matrix multiplications are relatively small,want to distribute work among processorsMemory ConsiderationsBia & Bjb are (Nvirt * X) long and are read in from hard drive asneededSerially, 2*(Nvirt*X)+Nvirt*Nvirt stored in memoryMemory delimited by earlier step to 3*X2Given Nvirt~N, X~4*N,Memory considerations for largest possible systems are3*4*4*N2=48*N2Memory occupied using this serial implementation for nprocessor shared memory system isnproc*(2*(Nvirt*X)+Nvirt*Nvirt)~nproc*(9*n2)For 8 processors, this amounts to 72*N2 in memory at a time,limiting system size unnecessarily -- need a different algorithmfor parallel computation for a work-distributed systemMemory-guided WorkDistributionNeed to minimize memory usage as wellas minimize number of reads from thehard drive -- which is expected tobecome dominant in the large systemlimitHave to satisfy both criteria in workdistribution schemeMemory-guided WorkDistributionSolutions:Hard drive read batching -- using the space availablePivoting -- using what is in memory to direct the next batchLoop over batches (determined by memory)Read Bia ∀i ∈(first batch-1 elements of batch)Loop over Nocc∉batch (Nocc>batch)Form all possible (ia|jb), including i==j (ia|ib)This provides a solution to the serial bottleneck of hard drive readsReduces the number of matrices read from ~Nocc2 (initialimplementation)New hard drive I/O costs:Nreads=Nocc+(Nocc-(batchsize-1)-1)+(Nocc-2*(batchsize-1)-1)+…Nreads=Nocc(Nbatches+1)-Nbatches-(batchsize-1)(Nbatches(Nbatches+1)/2)Nreads~Nocc*Nbatches-(batchsize-1)Nbatches2/2Nreads~Nocc2/(batchsize-1)-Nocc2/(2*(batchsize-1))~=Nocc2/(2*(batchsize-1))This solves the hard drive read problem in serial, but does it work forparallel?X X X X X X X X X XX X X X X X X X XX X X X X X X XXXXXXXXX X X X X X X X X XX X X X X X X X XX X X X X X X XX X X X X X XX X X X X XX X X X XXXXXX X X X X X X X X XX X X X X X X X XX X X X X X X XX X X X X X XX X X X X XX X X X XX X X XX XXBatch size limitationsTypical system sizes constrain the batch size toa maximum of ~9-11 B matrices in memory atonceUsing 9 B matrices, ignoring edge cases, 8computations must be performed at eachstep, with each computation of equivalentcostHowever, this method requires synchronizationafter each matrix multiplication -- a potentialbottleneck for anisotropic systemsDistributed Memory, theFutureFurther modeling is needed for extendingthis method to distributed memorysystems since batches will not haveequivalent workloads -- the occupiedspace needing to be spanned isdiminished by each

View Full Document

Berkeley COMPSCI C267 - Parallel RI-MP2

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4 out of 11 pages.

Berkeley COMPSCI C267 - Parallel RI-MP2

Sign up for free to view:

Please select your school