DOC PREVIEW
Berkeley COMPSCI C267 - Final Project Suggestions

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 267: Applications of Parallel Computers Final Project SuggestionsOutlineCS267 Class Projects from 2004CS267 Class Projects from 2004 (cont)Planned Guest LecturersSuggested projects (1)Missing Drivers in Sca/LAPACKMore missing driversSuggested projects (2)Suggested projects (3)04/06/2006 CS267 Lecture 22a1CS 267: Applications of Parallel ComputersFinal Project SuggestionsJames Demmelwww.cs.berkeley.edu/~demmel/cs267_Spr0604/06/2006 CS267 Lecture 22a2Outline•Kinds of projects•Evaluating and improving the performance of a parallel application•“Application” could be full scientific application, or important kernel•Parallelizing a sequential application •other kinds of performance improvements possible too, eg memory hierarchy tuning•Devise a new parallel algorithm for some problem•Porting parallel application or systems software to new architecture•Example of previous projects (all on-line)•Upcoming guest lecturers•See their previous lectures, or contact them, for project ideas•Suggested projects04/06/2006 CS267 Lecture 22a3CS267 Class Projects from 2004•BLAST Implementation on BEE2 — Chen Chang •PFLAMELET; An Unsteady Flamelet Solver for Parallel Computers — Fabrizio Bisetti •Parallel Pattern Matcher — Frank Gennari, Shariq Rizvi, and Guille Díez-Cañas •Parallel Simulation in Metropolis — Guang Yang •A Survey of Performance Optimizations for Titanium Immersed Boundary Simulation — Hormozd Gahvari, Omair Kamil, Benjamin Lee, Meling Ngo, and Armando Solar •Parallelization of oopd1 — Jeff Hammel •Optimization and Evaluation of a Titanium Adaptive Mesh Refinement Code — Amir Kamil, Ben Schwarz, and Jimmy Su04/06/2006 CS267 Lecture 22a4CS267 Class Projects from 2004 (cont)•Communication Savings With Ghost Cell Expansion For Domain Decompositions Of Finite Difference Grids — C. Zambrana Rojas and Mark Hoemmen •Parallelization of Phylogenetic Tree Construction — Michael Tung •UPC Implementation of the Sparse Triangular Solve and NAS FT — Christian Bell and Rajesh Nishtala •Widescale Load Balanced Shared Memory Model for Parallel Computing — Sonesh Surana, Yatish Patel, and Dan Adkins04/06/2006 CS267 Lecture 22a5Planned Guest Lecturers•Katherine Yelick (UPC, heart modeling)•David Anderson (volunteer computing)•Kimmen Sjolander (phylogenetic analysis of proteins – SATCHMO – Bonnie Kirkpatrick)•Julian Borrill, (astrophysical data analysis)•Wes Bethel, (graphics and data visualization)•Phil Colella, (adaptive mesh refinement)•David Skinner, (tools for scaling up applications)•Xiaoye Li, (sparse linear algebra)•Osni Marques and Tony Drummond, (ACTS Toolkit)•Andrew Canning (computational neuroscience)•Michael Wehner (climate modeling)04/06/2006 CS267 Lecture 22a6Suggested projects (1)•Weekly research group meetings on these and related topics (see J. Demmel and K. Yelick)•Contribute to upcoming ScaLAPACK release (JD)•Proposal, talk at www.cs.berkeley.edu/~demmel; ask me for latest•Performance evaluation of existing parallel algorithms•Ex: New eigensolvers based on successive band reduction•Improved implementations of existing parallel algorithms•Ex: Use UPC to overlap communication, computation•Many serial algorithms to be parallelized•See following slides04/06/2006 CS267 Lecture 22a7Missing Drivers in Sca/LAPACKLAPACK ScaLAPACKLinear Equations LUCholeskyLDLTxGESVxPOSVxSYSVPxGESVPxPOSVmissingLeast Squares (LS) QRQR+pivotSVD/QRSVD/D&CSVD/MRRRQR + iterative refine.xGELSxGELSYxGELSSxGELSDmissingmissingPxGELSmissingmissingmissing (intent?)missingmissingGeneralized LS LS + equality constr.Generalized LMAbove + Iterative ref.xGGLSExGGGLMmissingmissingmissingmissing04/06/2006 CS267 Lecture 22a8More missing driversLAPACK ScaLAPACKSymmetric EVD QR / Bisection+InvitD&CMRRRxSYEV / XxSYEVDxSYEVRPxSYEV / XPxSYEVDmissingNonsymmetric EVD Schur formVectors tooxGEES / XxGEEV /Xmissing drivermissing driverSVD QRD&CMRRRJacobixGESVDxGESDDmissingmissingPxGESVDmissing (intent?)missingMissingGeneralized Symmetric EVD QR / Bisection+InvitD&CMRRRxSYGV / XxSYGVDmissingPxSYGV / Xmissing (intent?)missingGeneralized Nonsymmetric EVDSchur formVectors tooxGGES / XxGGEV / XmissingmissingGeneralized SVD KogbetliantzMRRRxGGSVDmissingmissing (intent)missing04/06/2006 CS267 Lecture 22a9Suggested projects (2)•Contribute to sparse linear algebra (JD & KY) •Performance tuning to minimize latency and bandwidth costs, both to memory and between processors (sparse => few flops per memory reference or word communicated)•Typical methods (eg CG = conjugate gradient) do some number of dot projects, saxpys for each SpMV, so communication cost is O(# iterations)•Our goal: Make latency cost O(1)!•Requires reorganizing algorithms drastically, including replacing SpMV by new kernel [Ax, A2x, A3x, … , Akx], which can be done with O(1) messages•Projects•Study scalability bottlenecks of current CG on real, large matrices•Optimize [Ax, A2x, A3x, … , Akx] on sequential machines•Optimize [Ax, A2x, A3x, … , Akx] on parallel machines04/06/2006 CS267 Lecture 22a10Suggested projects (3)•Evaluate new languages on applications (KY)•UPC or Titanium•UPC for asynchrony, overlapping communication & computation•ScaLAPACK in UPC•Use UPC-based 3D FFT in your application•Optimize existing 1D FFT in UPC, to use 3D techniques•Porting, Evaluating parallel systems software (KY)•Port UPC to RAMP•Port GASNET to Blue Gene, evaluate


View Full Document

Berkeley COMPSCI C267 - Final Project Suggestions

Documents in this Course
Lecture 4

Lecture 4

52 pages

Split-C

Split-C

5 pages

Lecture 5

Lecture 5

40 pages

Load more
Download Final Project Suggestions
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Final Project Suggestions and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Final Project Suggestions 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?