Unformatted text preview:

CPE 631 Lecture 21 Multiprocessors Aleksandar Milenkovi milenka ece uah edu Electrical and Computer Engineering University of Alabama in Huntsville CPE 631 AM Review Small Scale Shared Memory Caches serve to Increase bandwidth versus bus memory Reduce latency of access Valuable for both private data and shared data What about cache consistency Time Event A B X memory 0 1 1 CPU A R x 1 2 CPU B R x 1 1 1 3 CPU A W x 0 0 1 0 14 01 19 1 UAH CPE631 2 CPE 631 AM Snoopy Cache State Machine III CPU Read hit State machine for CPU requests for each cache block and for bus requests for each cache block Cache State Write miss for this block Shared CPU Read Invalid read only Place read miss on bus CPU Write Place Write Miss on bus Write miss CPU read miss CPU Read miss for this block Write back block Place read miss Place read miss on bus Write Back CPU Write on bus Block abort Place Write Miss on Bus memory access Block Read miss Write Back for this block Block abort Exclusive memory access read write CPU Write Miss CPU read hit Write back cache block CPU write hit Place write miss on bus 14 01 19 UAH CPE631 3 CPE 631 AM MESI CPU Requests CPU Read hit CPU Read BusRd NoSh Invalid CPU Write BusRdEx CPU Read miss BusRd NoSh Exclusive CPU read miss BusWB BusRd NoSh CPU read miss BusWB BusRd NoSh CPU write hit CPU read miss BusWB BusRd Sh CPU read hit CPU write hit Modified read write 14 01 19 CPU read miss BusWB BusRd Sh CPU Write Miss BusRdEx CPU Write Hit BusInv UAH CPE631 Shared CPU Read hit 4 CPE 631 AM MESI Bus Requests Invalid BusRdEx BusRdEx Exclusive BusRd Sh BusRdEx BusWB Modified read write 14 01 19 BusRd BusWB UAH CPE631 Shared 5 CPE 631 AM Fundamental Issues 3 Issues to characterize parallel machines 1 Naming 2 Synchronization 3 Performance Latency and Bandwidth covered earlier 14 01 19 UAH CPE631 6 CPE 631 AM Fundamental Issue 1 Naming Naming how to solve large problem fast what data is shared how it is addressed what operations can access data how processes refer to each other Choice of naming affects code produced by a compiler via load where just remember address or keep track of processor number and local virtual address for msg passing Choice of naming affects replication of data via load in cache memory hierarchy or via SW replication and consistency 14 01 19 UAH CPE631 7 CPE 631 AM Fundamental Issue 1 Naming Global physical address space any processor can generate address and access it in a single operation memory can be anywhere virtual addr translation handles it Global virtual address space if the address space of each process can be configured to contain all shared data of the parallel program Segmented shared address space locations are named process number address uniformly for all processes of the parallel program 14 01 19 UAH CPE631 8 CPE 631 AM Fundamental Issue 2 Synchronization To cooperate processes must coordinate Message passing is implicit coordination with transmission or arrival of data Shared address additional operations to explicitly coordinate e g write a flag awaken a thread interrupt a processor 14 01 19 UAH CPE631 9 CPE 631 AM Summary Parallel Framework Layers Programming Model Programming Model Communication Abstraction Interconnection SW OS Interconnection HW Multiprogramming lots of jobs no communication Shared address space communicate via memory Message passing send and receive messages Data Parallel several agents operate on several data sets simultaneously and then exchange information globally and simultaneously shared or message passing Communication Abstraction Shared address space e g load store atomic swap Message passing e g send receive library calls Debate over this topic ease of programming scaling many hardware designs 1 1 programming model 14 01 19 UAH CPE631 10 CPE 631 AM Larger MPs Separate Memory per Processor Local or Remote access via memory controller One Cache Coherency solution non cached pages Alternative directory per cache that tracks state of every block in every cache Which caches have a copies of block dirty vs clean Info per memory block vs per cache block PLUS In memory simpler protocol centralized one location MINUS In memory directory is memory size vs cache size Prevent directory as bottleneck distribute directory entries with memory each keeping track of which Procs have copies of their blocks 14 01 19 UAH CPE631 11 CPE 631 AM Distributed Directory MPs M P0 P1 Pn C C C I O M I O M I O Interconnection Network C Cache M Memory IO Input Output 14 01 19 UAH CPE631 12 CPE 631 AM Directory Protocol Similar to Snoopy Protocol Three states Shared 1 processors have data memory up to date Uncached no processor has it not valid in any cache Exclusive 1 processor owner has data memory out of date In addition to cache state must track which processors have data when in the shared state usually bit vector 1 if processor has copy Keep it simple r Writes to non exclusive data write miss Processor blocks until access completes Assume messages received and acted upon in order sent 14 01 19 UAH CPE631 13 CPE 631 AM Directory Protocol No bus and don t want to broadcast interconnect no longer single arbitration point all messages have explicit responses Terms typically 3 processors involved Local node where a request originates Home node where the memory location of an address resides Remote node has a copy of a cache block whether exclusive or shared Example messages on next slide P processor number A address 14 01 19 UAH CPE631 14 CPE 631 AM Directory Protocol Messages Message type Read miss Source Destination Msg Content Local cache Home directory P A Processor P reads data at address A make P a read sharer and arrange to send data back Write miss Local cache Home directory P A Processor P writes data at address A make P the exclusive owner and arrange to send data back Invalidate Home directory Remote caches A Invalidate a shared copy at address A Fetch Home directory Remote cache A Fetch the block at address A and send it to its home directory Fetch Invalidate Home directory Remote cache A Fetch the block at address A and send it to its home directory invalidate the block in the cache Data value reply Home directory Local cache Data Return a data value from the home memory read miss response Data write back Remote cache Home directory A Data Write back a data value for address A invalidate response 14 01 19 UAH CPE631 15 CPE 631 AM State Transition Diagram for an Individual Cache Block in a Directory Based System States


View Full Document

UAH CPE 631 - Multiprocessors

Loading Unlocking...
Login

Join to view Multiprocessors and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiprocessors and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?