CS 152 Computer Architecture and Engineering Lecture 23 Synchronization 2005 11 17 John Lazzaro www cs berkeley edu lazzaro TAs David Marquardt and Udam Saini www inst eecs berkeley edu cs152 CS 152 L23 Synchronization UC Regents Fall 2005 UCB Last Time How Routers Work 2 Forwarding engine determines the next hop for the packet and returns next hop data to the line card together with an updated header 2 2 CS 152 L23 Synchronization UC Regents Fall 2005 UCB Recall Two CPUs sharing memory CS 152 L23 Synchronization In In earlier earlier lectures lectures we we pretended pretended it it was was easy easy to to let let several several CPUs CPUs share share a a In fact In fact it it is is an an memory memory architectural architectural system system challenge challenge Even Even letting letting several several threads threads on on one one machine machine UC Regents Fall 2005 UCB Today Hardware Thread Support Producer Consumer One thread writes A one thread reads A Locks Two threads share write access to A On Tuesday Multiprocessor memory system design and synchronization issues Tuesday is a simplified overview graduatelevel architecture courses spend weeks on CS 152 L23 Synchronization UC Regents Fall 2005 UCB How 2 threads share a queue We begin with an empty queue Tail Head Words in Memory Higher Address Numbers Thread 1 T1 adds data to the tail of the queue Producer thread Thread 2 T2 takes data from the head of the queue Consumer thread CS 152 L23 Synchronization UC Regents Fall 2005 UCB Producer adding x to the queue Tail Head Words in Memory Before Higher Address Numbers T1 code producer ORi R1 R0 xval LW R2 tail R0 SW R1 0 R2 ADDi R2 R2 4 SW R2 0 tail Tail After CS 152 L23 Synchronization Load x value into R1 Load tail pointer into R2 Store x into queue Shift tail by one word Update tail memory addr Head x Higher Address Words in Memory UC Regents Fall 2005 UCB Producer adding y to the queue Tail Head Before Words in Memory x Higher Address Numbers ORi R1 R0 yval LW R2 tail R0 SW R1 0 R2 ADDi R2 R2 4 SW R2 0 tail T1 code producer Tail After CS 152 L23 Synchronization Load y value into R1 Load tail pointer into R2 Store y into queue Shift tail by one word Update tail memory addr Head y x Higher Address Words in Memory UC Regents Fall 2005 UCB Consumer reading the queue Tail Before Head y x LW R3 head R0 spin LW R4 tail R0 BEQ R4 R3 spin LW R5 0 R3 T2 code ADDi R3 R3 4 consumer SW R3 head R0 Tail After CS 152 L23 Synchronization Words in Memory Load head pointer into R3 Load tail pointer into R4 If queue empty wait Read x from queue into R5 Shift head by one word Update head pointer Head y Higher Address Words in Memory UC Regents Fall 2005 UCB What can go wrong Tail Before Head y x Higher Addresses After Head y Higher Addresse Load x value into R1 Load tail pointer into R2 Store x into queue Shift tail by one word Update tail pointer LW R3 head R0 spin LW R4 tail R0 3 BEQ R4 R3 spin LW R5 0 R3 T2 code 4 ADDi R3 R3 4 consumer SW R3 head R0 Load head pointer into R3 Load tail pointer into R4 If queue empty wait Read x from queue into R5 Shift head by one word Update head pointer T1 code producer ORi R1 R0 x LW R2 tail R0 SW R1 0 R2 1 ADDi R2 R2 4 SW R2 0 tail 2 Tail What if order is 2 3 4 1 Then x is read before it is written CS 152 L23 Synchronization UC Regents Fall 2005 UCB Leslie Lamport Sequential Consistency Sequential Consistency As if each thread takes turns executing and instructions in each thread execute in program order T1 code producer ORi R1 R0 x LW R2 tail R0 SW R1 0 R2 1 ADDi R2 R2 4 SW R2 0 tail 2 LW R3 head R0 spin LW R4 tail R0 3 BEQ R4 R3 spin LW R5 0 R3 T2 code 4 ADDi R3 R3 4 consumer SW R3 head R0 Load x value into R1 Load queue tail into R2 Store x into queue Shift tail by one word Update tail memory addr Load queue head into R3 Load queue tail into R4 If queue empty wait Read x from queue into R5 Shift head by one word Update head memory addr Legal orders 1 2 3 4 or 1 3 2 4 or 3 4 1 2 Sequential architectures get the but not 2 3 Consistent 1 4 right answer but give up many optimizations CS 152 L23 Synchronization UC Regents Fall 2005 UCB Efficient alternative Memory barriers In the general case machine is not sequentially consistent When needed a memory barrier may be added to the program a fence All memory operations before fence complete then memory operations after the fence begin ORi R1 R0 x LW R2 tail R0 SW R1 0 R2 MEMBAR ADDi R2 R2 4 SW R2 0 tail 1 2 Ensures 1 completes before 2 takes effect MEMBAR is expensive but you only pay for it when you Many MEMBAR variations for efficiency use it versions that only effect loads or stores certain memory CS 152 L23 Synchronization UC Regents Fall 2005 UCB Producer consumer memory fences Tail Before Head y x Higher Addresses After Head y Higher Addresse Load x value into R1 Load queue tail into R2 Store x into queue LW R3 head R0 spin LW R4 tail R0 3 T2 code BEQ R4 R3 spin consumer MEMBAR LW R5 0 R3 4 ADDi R3 R3 4 SW R3 head R0 Load queue head into R3 Load queue tail into R4 If queue empty wait T1 code producer ORi R1 R0 x LW R2 tail R0 SW R1 0 R2 1 MEMBAR ADDi R2 R2 4 SW R2 0 tail 2 Tail Shift tail by one word Update tail memory addr Read x from queue into R5 Shift head by one word Update head memory addr Ensures 1 happens before 2 and 3 happens CS 152 L23 Synchronization UC Regents Fall 2005 UCB Reminder Final Checkoff this Friday Final report due following Monday 11 59 PM TAs will provide secret MIPS machine code tests Bonus points if these tests run by end of section If not TAs give you test code to use over weekend Mid term project presentations after CS 152 L23 Synchronization UC Regents Fall 2005 UCB CS 152 What s left Monday 11 21 Final report due 11 59 PM Class as normal on Tuesday then Thanksgiving Tuesday 11 29 Architecture Cal Team evaluations due 11 59 PM Tuesday Thursday 12 1 Mid term review in class Tuesday 12 6 Mid term II 6 00 9 00 PM No class 11 12 30 that day Thursday 12 8 Final presentations CS 152 L23 Synchronization UC Regents Fall …
View Full Document
Unlocking...