CPE 631 Lecture 19 Multiprocessors Aleksandar Milenkovi milenka ece uah edu Electrical and Computer Engineering University of Alabama in Huntsville CPE 631 AM Review Small Scale Shared Memory Caches serve to Increase bandwidth versus bus memory Reduce latency of access Valuable for both private data and shared data What about cache consistency Time Event A B X memory 0 1 1 CPU A R x 1 2 CPU B R x 1 1 1 3 CPU A W x 0 0 1 0 13 01 19 1 UAH CPE631 2 CPE 631 AM What Does Coherency Mean Informally Any read of a data item must return the most recently written value this definition includes both coherence and consistency coherence what values can be returned by a read consistency when a written value will be returned by a read Memory system is coherent if a read X by P1 that follows a write X by P1 with no writes of X by another processor occurring between these two events always returns the value written by P1 a read X by P1 that follows a write X by another processor returns the written value if the read and write are sufficiently separated and no other writes occur between writes to the same location are serialized two writes to the same location by any two CPUs are seen in the same order by all CPUs 13 01 19 UAH CPE631 3 CPE 631 AM Potential HW Coherence Solutions Snooping Solution Snoopy Bus every cache that has a copy of the data also has a copy of the sharing status of the block Processors snoop to see if they have a copy and respond accordingly Requires broadcast since caching information is at processors Works well with bus natural broadcast medium Dominates for small scale machines most of the market Directory Based Schemes discuss later Keep track of what is being shared in 1 centralized place logically Distributed memory distributed directory for scalability avoids bottlenecks Send point to point requests to processors via network Scales better than Snooping Actually existed BEFORE Snooping based schemes 13 01 19 UAH CPE631 4 CPE 631 AM Basic Snoopy Protocols Write Invalidate Protocol A CPU has exclusive access to a data item before it writes that item Write to shared data an invalidate is sent to all caches which snoop and invalidate any copies Read Miss Write through memory is always up to date Write back snoop in caches to find most recent copy Write Update Protocol typically write through Write to shared data broadcast on bus processors snoop and update any copies Read miss memory is always up to date Write serialization bus serializes requests Bus is single point of arbitration 13 01 19 UAH CPE631 5 CPE 631 AM Write Invalidate versus Update Multiple writes to the same word with no intervening reads Update multiple broadcasts For multiword cache blocks Update each word written in a cache block requires a write broadcast Invalidate only the first write to any word in the block requires an invalidation Update has lower latency between write and read 13 01 19 UAH CPE631 6 CPE 631 AM Snooping Cache Variations Basic Protocol Exclusive Shared Invalid Berkeley Protocol Illinois Protocol Owned Exclusive Private Dirty Private Clean Owned Shared Shared Shared Invalid Invalid MESI Protocol Modfied private Memory eXclusive private Memory Shared shared Memory Invalid Owner can update via bus invalidate operation Owner must write back when replaced in cache If read sourced from memory then Private Clean if read sourced from other cache then Shared Can write in cache if held private clean or dirty 13 01 19 UAH CPE631 7 CPE 631 AM An Example Snoopy Protocol Invalidation protocol write back cache Each block of memory is in one state Clean in all caches and up to date in memory Shared OR Dirty in exactly one cache Exclusive OR Not in any caches Each cache block is in one state track these Shared block can be read OR Exclusive cache has only copy its writeable and dirty OR Invalid block contains no data Read misses cause all caches to snoop bus Writes to clean line are treated as misses 13 01 19 UAH CPE631 8 CPE 631 AM Snoopy Cache State Machine I State machine for CPU requests for each cache block CPU Read hit CPU Read Invalid Place read miss on bus Shared read only CPU Write Place Write Miss on bus CPU read miss CPU Read miss Write back block Place read miss Place read miss on bus on bus CPU Write Place Write Miss on Bus Exclusive read write CPU read hit CPU write hit 13 01 19 CPU Write Miss Write back cache block Place write miss on bus UAH CPE631 9 CPE 631 AM Snoopy Cache State Machine II State machine for bus requests for each cache block Invalid Write miss for this block Write Back Block abort memory access Exclusive read write 13 01 19 UAH CPE631 Write miss for this block Shared read only Read miss for this block Write Back Block abort memory access 10 CPE 631 AM Snoopy Cache State Machine III CPU Read hit State machine for CPU requests for each cache block and for bus requests for each cache block Cache State Write miss for this block Shared CPU Read Invalid read only Place read miss on bus CPU Write Place Write Miss on bus Write miss CPU read miss CPU Read miss for this block Write back block Place read miss Place read miss on bus Write Back CPU Write on bus Block abort Place Write Miss on Bus memory access Block Read miss Write Back for this block Block abort Exclusive memory access read write CPU Write Miss CPU read hit Write back cache block CPU write hit Place write miss on bus 13 01 19 UAH CPE631 11 CPE 631 AM Example Processor 1 step P1 Write 10 to A1 P1 Read A1 P2 Read A1 P1 State Processor 2 P2 Addr Value State Bus Memory Bus Addr Value Action Proc Addr Memory Value Addr Value P2 Write 20 to A1 P2 Write 40 to A2 Assumes initial cache state is invalid and A1 and A2 map to same cache block but A1 A2 CPU Read hit Remote Write Invalid Remote Write Write Back Read miss on bus Write miss on bus Remote Read Write Back Shared CPU Read Miss CPU Write Place Write Miss on Bus Exclusive 13 01 19 CPU Write Miss CPU read hit Write Back CPU write hit UAH CPE631 12 CPE 631 AM Example Step 1 step P1 Write 10 to A1 P1 Read A1 P2 Read A1 P1 State Excl P2 Addr Value State A1 10 Bus Addr Value Action Proc Addr WrMs P1 A1 Memory Value Addr Value P2 Write 20 to A1 P2 Write 40 to A2 Assumes initial cache state is invalid and A1 and A2 map to same cache block but A1 A2 Active arrow CPU Read hit Remote Write Invalid Remote Write Write Back Read miss on bus Write miss on bus Remote Read Write Back Shared CPU Read Miss CPU Write Place Write Miss on …
View Full Document
Unlocking...