CS 152 Computer Architecture and Engineering Lecture 19 Synchronization and Sequential Consistency Krste Asanovic Electrical Engineering and Computer Sciences University of California Berkeley http www eecs berkeley edu krste http inst cs berkeley edu cs152 Time processor cycle Summary Multithreaded Categories Superscalar 4 17 2008 Fine Grained Coarse Grained Multiprocessing Simultaneous Multithreading Thread 1 Thread 3 Thread 5 Thread 2 Thread 4 Idle slot CS152 Spring 08 2 Uniprocessor Performance SPECint 3X Performance vs VAX 11 780 10000 1000 From Hennessy and Patterson Computer Architecture A Quantitative Approach 4th edition 2006 year 52 year 100 10 25 year 1 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 VAX 25 year 1978 to 1986 RISC x86 52 year 1986 to 2002 3 RISC x86 year 2002 to present CS152 Spring 08 4 17 2008 D j vu all over again today s processors are nearing an impasse as technologies approach the speed of light David Mitchell The Transputer The Time Is Now 1989 Transputer had bad timing Uniprocessor performance Procrastination rewarded 2X seq perf 1 5 years We are dedicating all of our future product development to multicore designs This is a sea change in computing Paul Otellini President Intel 2005 All microprocessor companies switch to MP 2X CPUs 2 yrs Procrastination penalized 2X sequential perf 5 yrs Manufacturer Year AMD 07 Intel 07 IBM 07 Sun 07 Processors chip 4 2 2 8 Threads Processor 1 1 2 8 Threads chip 4 17 2008 4 2 4 64 CS152 Spring 08 4 Symmetric Multiprocessors Processor Processor CPU Memory bus bridge Memory I O bus I O controller symmetric All memory is equally far away from all processors Any processor can do any I O set up a DMA transfer 4 17 2008 I O controller I O controller Graphics output Networks 5 CS152 Spring 08 Synchronization The need for synchronization arises whenever there are concurrent processes in a system even in a uniprocessor system Forks and Joins In parallel programming a parallel process may want to wait until several events have occurred Producer Consumer A consumer process must wait until the producer process has produced data Exclusive use of a resource Operating system has to ensure that only one process uses a resource at a given time 4 17 2008 CS152 Spring 08 fork P2 P1 join producer consumer 6 A Producer Consumer Example Producer tail head Rtail Rtail Producer posting Item x Load Rtail tail Store Rtail x Rtail Rtail 1 Store tail Rtail The program is written assuming instructions are executed in order 4 17 2008 Consumer Rhead R Consumer Load Rhead head spin Load Rtail tail if Rhead Rtail goto spin Load R Rhead Rhead Rhead 1 Store head Rhead process R Problems CS152 Spring 08 7 A Producer Consumer Example continued Producer posting Item x Load Rtail tail 1 Store Rtail x Rtail Rtail 1 Store tail Rtail 2 Consumer Load Rhead head 3 spin Load Rtail tail if Rhead Rtail goto spin Load R Rhead 4 Rhead Rhead 1 Store head Rhead Can the tail pointer get updated process R before the item x is stored Programmer assumes that if 3 happens after 2 then 4 happens after 1 Problem sequences are 2 3 4 1 4 1 2 3 4 17 2008 CS152 Spring 08 8 Sequential Consistency A Memory Model P P P P P P M A system is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order and the operations of each individual processor appear in the order specified by the program Leslie Lamport Sequential Consistency arbitrary order preserving interleaving of memory references of sequential programs 4 17 2008 9 CS152 Spring 08 Sequential Consistency Sequential concurrent tasks T1 T2 Shared variables X Y initially X 0 Y 10 T1 Store X 1 X 1 Store Y 11 Y 11 T2 Load R1 Y Store Y R1 Y Y Load R2 X Store X R2 X X what are the legitimate answers for X and Y X Y 1 11 0 10 1 10 0 11 4 17 2008 CS152 Spring 08 10 Sequential Consistency Sequential consistency imposes more memory ordering constraints than those imposed by uniprocessor program dependencies What are these in our example T1 Store X 1 X 1 Store Y 11 Y 11 T2 Load R1 Y Store Y R1 Y Y Load R2 X Store X R2 X X Does can a system with caches or out of order execution capability provide a sequentially consistent view of the memory more on this later 4 17 2008 11 CS152 Spring 08 Multiple Consumer Example Producer tail head Rtail Consumer 2 Producer posting Item x Load Rtail tail Store Rtail x Rtail Rtail 1 Store tail Rtail Critical section Needs to be executed atomically by one consumer locks 4 17 2008 Consumer 1 Rhead R Rtail Rhead R Rtail Consumer Load Rhead head spin Load Rtail tail if Rhead Rtail goto spin Load R Rhead Rhead Rhead 1 Store head Rhead process R What is wrong with this code CS152 Spring 08 12 Locks or Semaphores E W Dijkstra 1965 A semaphore is a non negative integer with the following operations P s if s 0 decrement s by 1 otherwise wait V s increment s by 1 and wake up one of the waiting processes P s and V s must be executed atomically i e without interruptions or interleaved accesses to s by other processors Process i P s critical section V s 4 17 2008 initial value of s determines the maximum no of processes in the critical section 13 CS152 Spring 08 Implementation of Semaphores Semaphores mutual exclusion can be implemented using ordinary Load and Store instructions in the Sequential Consistency memory model However protocols for mutual exclusion are difficult to design Simpler solution atomic read modify write instructions Examples m is a memory location R is a register Test Set m R R M m if R 0 then M m 1 4 17 2008 Fetch Add m RV R R M m M m R RV CS152 Spring 08 Swap m R Rt M m M m R R Rt 14 CS152 Administrivia CS194 6 class description Quiz results Quiz common mistakes 4 17 2008 CS152 Spring 08 15 CS194 6 Digital Systems Project Laboratory Prerequisites EECS 150 Units Credit 4 may be taken for a grade Meeting times M 10 30 12 lecture F 10 12 lab Instructor TBA Fall 2008 CS 194 is a capstone digital design project course The projects are team projects with 2 4 students per team Teams propose a project in a detailed design document implement the project in Verilog map it to a state of the art FPGA based platform and verify its correct operation using a test vector suite Projects may be of two types general purpose computing systems based on a standard ISA for example a pipelined implementation of a subset of the MIPS ISA with caches and a DRAM interface or special purpose digital
View Full Document
Unlocking...