DOC PREVIEW
UMD CMSC 714 - Improving Memory Hierarchy Performance for Irregular Applications

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1Improving Memory Hierarchy Performance for IrregularApplications Using Data and Computation ReorderingsBy: John Mellor-Crummey, David Whalley, Ken KennedyPresented By: Saeed AlaeiOutline Memory hierarchies and performance. Data and computation reordering in general. Regular vs. Irregular applications. Space filling curves: Hilbert curve Morton curve Data and computation reordering approaches. Experimental resultsMemory performance Gap between CPU speed and memory speed has increased rapidly. Multi-level memory hierarchies are used to address this memory access bottleneck. Irregular applications underutilize the memory hierarchies. Data and computation reordering can greatly help.Data reorderingfor(i=0;i<n;i++)for(j=0;j<n;j++)for(k=0;k<n;k++)C[i][j]+=A[i][k]*B[k][j];for(i=0;i<n;i++)for(j=0;j<n;j++)for(k=0;k<n;k++)C[i][j]+=A[i][k]*BT[j][k];Computation reorderingfor(i=0;i<n;i++)for(j=0;j<n;j++)for(k=0;k<n;k++)C[i][j]+=A[i][k]*B[k][j];for(i=0;i<n;i++)for(k=0;k<n;k++)for(j=0;j<n;j++)C[i][j]+=A[i][k]*B[k][j];Regular vs. Irregular Applications Regular applications: Static analysis and fixed access pattern. Techniques: Loop blocking. Data prefetching. Instances: Matrix multiplication/inversion.Irregular applications: Dynamic analysis, access pattern unknown until runtime. Proposed techniques: Dynamic data reordering before major computation phase. Computations reordering by blocking.2Irregular applications Poor spatial and temporal locality. Bandwidth and latency problems: Cache miss. TLB miss. Instances: N-Body simulation.N-BodySpace Filling Curve A space-filling curve for some finite space of d dimensions (d ≥ 2) is a continuous, non-smooth curve that passes arbitrarily close to every point.  Each point in a d-dimensional space can be mapped to the nearest position along a 1-dimensional spacefilling curve by applying a sequence of bit-level logical operations to its d-dimensional coordinatesHilbert CurvesMorton CurvesX=1010101Y=1010010M=11001100011001Reordering Data Reordering Computation Reordering3Data Reordering Approaches Changes the order of the elements of data. Doesn’t change the order in which the elements. are referenced Approaches: First touch reordering. Space filling curve reordering.First Touch Data Reordering It is a greedy approach to improve spatial localities in irregular applications. Changes the order of particle information Improves the locality in time by reordering the particle list.First Touch Reordering (cont’d) Advantages: Simple to implement. Requires linear time. Disadvantages: Interaction list must be known in advance.Space Filling Curve Data Reordering Steps: Particle coordinates must be mapped to positions in space filling curve. Particles are sorted in ascending order by their position on thecurve. Advantages: Can be done prior to knowing the order of computation. Allows some computation reordering (if computed before the interaction list). Disadvantages: May require more overhead due to sorting.Space Filling Curve Data Reordering (cont’d) Increases the spatial locality.Computation Reordering Changes the order in which the elements. are referenced. Doesn’t Changes the order of the elements of data. Can improve both temporal and spatial locality of access. Approaches: Space filling curve reordering. Reordering by blocking.4Space Filling Curve Computation Reordering Changes the order of elements in the interaction list. The order of particle information remains unchanged. Improves temporal locality.Computation Reordering by BlockingComputation Reordering by Blocking (cont’d)01 2345PPPPPPPPxPPPPzPPyPPP0123 Blocking can also be done base on different approaches (e.g. by taking the address).Computation Reordering by Blocking (cont’d) Blocking can be extended to multiple levels. A blocking factor depends on architectural characteristics: Size of cache lines or page size for TLB. The number of sets in the cache. Associativity. Replacement policy. …Computation Reordering by Blocking (cont’d) The recursive divide and conquer approach can be used to achieve a machine independent blocking. Computation Reordering by Blocking (cont’d) For each pair [Pi, Pj] in the interaction list, we compute the interleaving of [block_of(Pi) , block_of(Pj)]. This is equivalent to using a Morton curve. A Hilbert curve can also be used. Both Hilbert and Morton curve give a recursive blocking.5Applying The Techniques Particle codes: Moldyn (a synthetic benchmark). Magi CHADMoldyn A synthetic benchmark for molecular dynamics simulation. Closely resembles the N-Body example. The time consuming part is the computeForcesfunction called on list of interactions. The experiment: Performed SGI O2 based on R10000 MIPS. 256,000 particles, over 27 million interactions.Moldyn – base line programBlock NumberInteraction NumberMoldyn – Hilbert CurveBlock NumberInteraction NumberMoldyn – Multi-level – first 10KBlock NumberInteraction NumberMoldyn – Multi-level – first 100KBlock NumberInteraction Number6Moldyn – Multi-level – first 1MBlock NumberInteraction NumberMoldyn – Various ReorderingsMagi An application used by the U.S. Air Force to perform hydrodynamic computation. Description:Magi – Various ReorderingsCHAD A Parallel unstructured mesh application for simulating 3D fluid flows with chemical reactions and fuel sprays. Contains mostly dense vector operations. The results indicate that the original ordering is already


View Full Document

UMD CMSC 714 - Improving Memory Hierarchy Performance for Irregular Applications

Documents in this Course
MTOOL

MTOOL

7 pages

BOINC

BOINC

21 pages

Eraser

Eraser

14 pages

Load more
Download Improving Memory Hierarchy Performance for Irregular Applications
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Improving Memory Hierarchy Performance for Irregular Applications and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Improving Memory Hierarchy Performance for Irregular Applications 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?