Stanford EE 482C - Stream Cache Architectures for Irregular Stream Applications

Unformatted text preview:

EE482C ProjectStream Cache Architectures for Irregular Stream ApplicationsTimothy KnightArjun SinghJung Ho AhnMotivationIrregular stream applications Perform a data-dependent traversal of an arbitrary graph Pointer chasingIndex streams Method for traversing irregular streams on a `traditional’ stream architecture (i.e.) Imagine Inefficient- Wasted SRF space- Wasted memory bandwidthArchitectural enhancements Stream cache Indexed SRF … ?Stream Cache Amplifies memory bandwidth in presence of temporal locality of reference Avoids repeated memory accesses for same address Allowing clusters to issue memory requests through a cluster cache avoids replicating data in the SRF.Applications StudiedApplication 1 All updates committed after all computations done Pseudo code:// Computation phasefor each vertex v:v.newdata = kernel (v.data, v.neigh.data)// Update phasefor each vertex v:v.data = v.newdataApplications Studied (Cont.)Application 2 Updates committed as soon as computed Conceptually an advancing wave-front of computation in a DAG Chosen to create coherence issues Pseudo code:repeat until all vertices are updated:for each vertex v with valid predecessors:v.data = kernel (v.neigh.data)Architectures Modeled1. No cache (baseline) 16 clusters 8 banked memory2. Cache accessible from SRF3. Dedicated cache accessible from clusters4. Cache accessible from clusters and SRFAll cache models: Have 8 banks, matching the memory Have 1 word per cache line Are not coherentArchitectures Modeled (Cont.)LRFSRFLoc. DRAMsNetLRFSRFLoc. DRAMsNetCacheSRFLoc. DRAMsNetCacheALUs ALUs ALUsArchitecture 1 Architecture 2 Architecture 3LRF LRFSRFLoc. DRAMsNetCacheALUsArchitecture 4MemorySystemClustersHardware Costs• Cache in memory system 8 SRAM banks, associated logic and buffers Need 1 word of tag storage per address cached• Enabling cluster memory access Reorder buffer: 16x 2/3*-ported register files Requests FIFO / AG 8 new buses between memory and clusters• Dedicated cluster cache (additional costs) 2 full 8x8 crossbarsSimulation EnvironmentImplemented a ‘cycle-by-cycle performance simulator’: Not functionally cycle-accurate Modeled throughputs, latencies, and resource constraints of architectures Applications coded in `macrocode’ Average sim: 5,000,000 cycles in 15 minutes of real time Parameters:- Data sets: record size, graph size, connectivity, locality- Apps: kernel computation time, strip size, cache use- Model: throughputs, latencies, mem. sizes, number of nodes, cache organizationResultsSpeedup vs. K (app1)00.511.522.533.540 2000 4000 6000 8000 10000 12000K (cycles)Speedup1 2 3 4Results (cont.)Speedup vs. K (app2)00.511.522.50 2000 4000 6000 8000 10000 12000K (cycles)Speedup1 2 3 4Results (cont.)Speedup vs. ROB length (app2)00.20.40.60.811.21.41.61.820 100 200 300 400 500 600ROB length Speedup1 3 4Results (cont.)Other results observed: Increase of speedup with increasing degree Relationship between speedup and number of words of data per record Constant speedup with increasing number of nodes for small data sets; large data sets blow up our simulator with many nodes Constant speedup with increasing number of verticesConclusion A stream cache can improve performance on an irregular stream app. Speedups of up to 3.5 were observed in our experiments A cache is most useful when the kernel performs little computation on each record. Various pros and cons between different cache architectures The architecture which performs best is determined by the choice of application and data setDiscussionLimitations on speedup from a stream cache: Amdahl’s Law: only some of the memory requests exhibit temporal locality Bank conflicts Dependencies in application Available locality ‘Preprocessing’ overheadFuture Work Study cache performance on real applications in


View Full Document

Stanford EE 482C - Stream Cache Architectures for Irregular Stream Applications

Download Stream Cache Architectures for Irregular Stream Applications
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Stream Cache Architectures for Irregular Stream Applications and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Stream Cache Architectures for Irregular Stream Applications 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?