Managing Distributed, Shared L2 Caches through OS-Level Page AllocationSangyeun ChoLei Jin(Micro 2006)ClaimsManage L2 cache through OS-level page allocationFlexible without complex hardware supportDynamically control data placement and cache sharingExample Chip and Tile (Core)L2 Cache AllocationTraditionally S = A mod NProposed change to S = PPN mod NAllows the OS to chose the virtual to physical mapping (PPN choses the slice)Line GranularityPage GranularityCongruence GroupCGi = {phys page (PPN=j)|pmap(j) = i}Used to map a physical page to a core.Convenient to use modulo-N on PPN for pmapCaching SchemesPrivate cachingOS allocates private pages for Pi running on core i from CGiShared cachingPages allocated from all congruence groups {CGi} (0<i<N-1)Round robin or RandomHybrid Caching SchemePartition {CGi} into K groups (K<N)Allocate pages from that group for a core within that groupAllows sharing within a groupOS ModificationsN free lists instead of a single free listDepends on the cache schemeMust consider existing data mappingsMakes allocation more complexPage SpreadingWhen the local L2 slice is too small for the working setNeed to consider data proximity to reduce the number of network hopsAlso must consider cache pressurenumber of accessed pages/cache sizeData ProximityBloom Filter MonitorKeeps track of pages accessedLow overhead512-kB cache slice8-kB page512-byte filter<0.5% false positiveVirtual Multicore!Simulator SetupSimpleScalar16 tiles (4x4 mesh) (2 cycle hop)Single issue16kB L1 I/D caches (1 cycle)512kB L2 cache slice (8 cycles)2GB main memory (300 cycles)ResultsResultsParallel WorkloadsHow to Kill Cache CoherenceGoal: Reduce overhead of cache coherenceAlso some of the messinessGranularity issueOS independentLower storage overhead and
View Full Document