CS 152 Computer Architecture and Engineering Lecture 17 Synchronization and Sequential Consistency Krste Asanovic Electrical Engineering and Computer Sciences University of California Berkeley http www eecs berkeley edu krste http inst cs berkeley edu cs152 April 4 2011 CS152 Spring 2011 Last Time Lecture 16 GPUs Data Level Parallelism the least flexible but cheapest form of machine parallelism and matches application demands Graphics processing units have developed generalpurpose processing capability for use outside of traditional graphics functionality GP GPUs SIMT model presents programmer with illusion of many independent threads but executes them in SIMD style on a vector like multilane engine Complex control flow handled with hardware to turn branches into mask vectors and stack to remember threads on alternate path No scalar processor so threads do redundant work unit stride loads and stores recovered via hardware memory coalescing April 4 2011 CS152 Spring 2011 2 Uniprocessor Performance SPECint 3X 10000 1000 From Hennessy and Patterson Computer Architecture A Quantitative Approach 4th edition 2006 year 52 year 100 Performance 10 vs VAX 11 780 25 year 1 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 VAX 25 year 1978 to 1986 RISC x86 52 year 1986 to 2002 3 CS152 Spring 09 RISC x86 year 2002 to present April 4 2011 CS152 Spring 2011 Parallel Processing D j vu all over again today s processors are nearing an impasse as technologies approach the speed of light David Mitchell The Transputer The Time Is Now 1989 Transputer had bad timing Uniprocessor performance Procrastination rewarded 2X seq perf 1 5 years We are dedicating all of our future product development to multicore designs This is a sea change in computing Paul Otellini President Intel 2005 All microprocessor companies switch to MP 2 CPUs 2 yrs Procrastination penalized 2X sequential perf 5 yrs Even handheld systems moving to multicore Nintendo 3DS iPad 2 iPhone5 have two cores each Next Playstation Portable NGP has four cores April 4 2011 CS152 Spring 2011 4 Symmetric Multiprocessors Processor Processor CPU Memory bus bridge Memory I O bus I O controller symmetric All memory is equally far away from all processors Any processor can do any I O set up a DMA transfer April 4 2011 I O controller I O controller Graphics output CS152 Spring 2011 Networks 5 Synchronization The need for synchronization arises whenever there are concurrent processes in a system even in a uniprocessor system producer consumer Producer Consumer A consumer process must wait until the producer process has produced data Mutual Exclusion Ensure that only one process uses a resource at a given time P1 P2 Shared Resource April 4 2011 CS152 Spring 2011 6 A Producer Consumer Example Producer tail head Rtail Rtail Producer posting Item x Load Rtail tail Store Rtail x Rtail Rtail 1 Store tail Rtail The program is written assuming instructions are executed in order April 4 2011 Consumer Rhead R Consumer Load Rhead head spin Load Rtail tail if Rhead Rtail goto spin Load R Rhead Rhead Rhead 1 Store head Rhead process R Problems CS152 Spring 2011 7 A Producer Consumer Example continued Producer posting Item x Load Rtail tail 1Store Rtail x Rtail Rtail 1 2Store tail R tail Can the tail pointer get updated before the item x is stored Consumer Load Rhead head spin Load Rtail tail 3 if Rhead Rtail goto spin 4 Load R Rhead Rhead Rhead 1 Store head Rhead process R Programmer assumes that if 3 happens after 2 then 4 happens after 1 Problem sequences are 2 3 4 1 4 1 2 3 April 4 2011 CS152 Spring 2011 8 Sequential Consistency A Memory Model P P P P P P M A system is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order and the operations of each individual processor appear in the order specified by the program Leslie Lamport Sequential Consistency arbitrary order preserving interleaving of memory references of sequential programs April 4 2011 CS152 Spring 2011 9 Sequential Consistency Sequential concurrent tasks T1 T2 Shared variables X Y initially X 0 Y 10 T1 T2 Store X 1 X 1 Load R1 Y Store Y 11 Y 11 Store Y R1 Y Y Load R2 X Store X R2 X X what are the legitimate answers for X and Y X Y 1 11 0 10 1 10 0 11 If y is 11 then x cannot be 0 April 4 2011 CS152 Spring 2011 10 Sequential Consistency Sequential consistency imposes more memory ordering constraints than those imposed by uniprocessor program dependencies What are these in our example T1 T2 Store X 1 X 1 Load R1 Y Store Y 11 Y 11 Store Y R1 Y Y Load R2 X additional SC requirements Store X R2 X X Does can a system with caches or out of order execution capability provide a sequentially consistent view of the memory more on this later April 4 2011 CS152 Spring 2011 11 Multiple Consumer Example Producer tail head Rtail R Rhead R Consumer Rtail 1 Consumer Rtail 2 Producer posting Item x Load Rtail tail Store Rtail x Rtail Rtail 1 Store tail Rtail Critical section Needs to be executed atomically by one consumer locks April 4 2011 Rhead Consumer Load Rhead head spin Load Rtail tail if Rhead Rtail goto spin Load R Rhead Rhead Rhead 1 Store head Rhead process R What is wrong with this code CS152 Spring 2011 12 Locks or Semaphores E W Dijkstra 1965 A semaphore is a non negative integer with the following operations P s if s 0 decrement s by 1 otherwise wait V s increment s by 1 and wake up one of the waiting processes P s and V s must be executed atomically i e without interruptions or interleaved accesses to s by other processors Process i P s critical section V s April 4 2011 initial value of s determines the maximum no of processes in the critical section CS152 Spring 2011 13 Implementation of Semaphores Semaphores mutual exclusion can be implemented using ordinary Load and Store instructions in the Sequential Consistency memory model However protocols for mutual exclusion are difficult to design Simpler solution atomic read modify write instructions Examples m is a memory location R is a register Test Set m R R M m if R 0 then M m 1 April 4 2011 Fetch Add m RV R R M m M m R RV CS152 Spring 2011 Swap m R Rt M m M m R R Rt 14 CS152 Administrivia Quiz 4 Monday April 11 VLIW Multithreading Vector and GPUs Covers lectures L13 L16 and associated readings PS 4 Lab 4 April 4 2011 CS152 Spring 2011 15 Multiple Consumers Example using the Test Set Instruction P Test Set mutex Rtemp if Rtemp 0 goto P
View Full Document
Unlocking...