Computer Architecture and Engineering CS152 Quiz 6 May 8th 2008 Professor Krste Asanovic Name This is a closed book closed notes exam 80 Minutes 10 Pages Notes Not all questions are of equal difficulty so look over the entire exam and budget your time carefully Please carefully state any assumptions you make Please write your name on every page in the quiz You must not discuss a quiz s contents with students who have not yet taken the quiz If you have inadvertently been exposed to the quiz prior to taking it you must tell the instructor or TA You will get no credit for selecting multiple choice answers without giving explanations if the instructions ask you to explain your choice Writing name on each sheet 1 Point Question 1 32 Points Question 2 29 Points Question 3 18 Points TOTAL 80 Points Problem Q6 1 Implementing Directories 32 POINTS Ben Bitdiddle is implementing a directory based cache coherence invalidate protocol for a 64processor system He first builds a smaller prototype with only 4 processors to test out the cache coherence protocol described in Handout 6 To implement the list of sharers S kept by home he maintains a bit vector per cache block to keep track of all the sharers The bit vector has one bit corresponding to each processor in the system The bit is set to one if the processor is caching a shared copy of the block and zero if the processor does not have a copy of the block For example if Processors 0 and 3 are caching a shared copy of some data the corresponding bit vector would be 1001 Problem Q6 1 A 4 POINTS The bit vector worked well for the 4 processor prototype but when building the actual 64processor system Ben discovered that he did not have enough hardware resources Assume each cache block is 32 bytes What is the overhead of maintaining the sharing bit vector for a 4processor system as a fraction of data storage bits What is the overhead for a 64 processor system as a fraction of data storage bits Overhead for a 4 processor system Overhead for a 64 processor system Problem Q6 1 B 8 POINTS Since Ben does not have the resources to keep track of all potential sharers in the 64 processor system he decides to limit S to keep track of only 1 processor using its 6 bit ID as shown in Figure Q6 1 A single sharer scheme When there is a load ShReq request to a shared cache block Ben invalidates the existing sharer to make room for the new sharer home sends an InvReq to the existing sharer the existing sharer sends an InvRep to home home replaces the exiting sharer s ID with the new sharer s ID and sends a ShRep to the new sharer 6 Sharer ID Figure Q6 1 A Consider a 64 processor system To determine the efficiency of a full bit vector scheme and single sharer scheme fill in the number of invalidate requests that are generated by the protocols for each step in the following two sequences of events Assume cache block B is uncached initially R dir dir for both sequences Sequence 1 Processor 0 reads B Processor 1 reads B Processor 0 reads B Sequence 2 Processor 0 reads B Processor 1 reads B Processor 2 writes B Full bit vector scheme of invalidate requests 0 single sharer scheme of invalidate requests 0 Full bit vector scheme of invalidate requests 0 single sharer scheme of invalidate requests 0 Problem Q6 1 C 8 POINTS Ben thinks that he can improve his original scheme by adding an extra global bit to S as shown in Figure Q6 1 B global bit scheme The global bit is set when there is more than 1 processor sharing the data and zero otherwise 1 0 6 Sharer ID global Figure Q6 1 B When the global bit is set home stops keeping track of a specific sharer and assumes that all processors are potential sharers 1 1 6 XXXXXX global Figure Q6 1 C Consider a 64 processor system To determine the efficiency of the global bit scheme fill in the number of invalidate requests that are generated for each step in the following two sequences of events Assume cache block B is uncached initially i e R dir dir for both sequences Sequence 1 Processor 0 reads B Processor 1 reads B Processor 0 reads B Sequence 2 Processor 0 reads B Processor 1 reads B Processor 2 writes B global bit scheme of invalidate requests 0 global bit scheme of invalidate requests 0 Page 4 of 12 Problem Q6 1 D 12 POINTS Ben decides to modify the protocol from Handout 6 for his global bit scheme Problem 4 4 C for a 64 processor system Your job is to complete the following table Table Q6 1 1 for him Use the same assumptions for the interconnection network cache states home directory states and protocol messages as Handout 6 However R dir if dir now means that there is only one processor sharing the cache data global bit is unset and R all means the global bit is set Use k to represent the site that issued the received message For Tr dir and Tw id states use j to represent the site that issued the original protocol request ShReq ExReq No Current State Message Received Next State Action 1 R dir dir ShReq R k ShRep k 2 R dir dir ExReq W k ExRep k 3 R dir dir ShReq 4 R all ShReq 5 R dir dir ExReq 6 R all ExReq 7 W id ShReq Tw id WbReq id 8 Tr dir id dir InvRep Tr dir id nothing InvRep W j ExRep j 9 10 Tr dir dir k Tw id FlushRep Table Q6 1 1 Partial List of Home Directory State Transitions Page 5 of 12 Problem Q6 2 Snoopy Cache Coherent Shared Memory 29 POINTS This problem improves the snoopy cache coherence protocol presented in Handout 7 As a review of that protocol When multiple shared copies of a modified data block exist one of the caches owns the current copy of the data block instead of the memory the owner has the data block in the OS state When another cache tries to retrieve the data block from memory the owner uses cache to cache intervention CCI to supply the data block CCI provides a faster response relative to memory and reduces the memory bandwidth demands However when multiple shared copies of a clean data block exist there is no owner and CCI is not used when another cache tries to retrieve the data block from memory To enable the use of CCI when multiple shared copies of a clean data block exist we introduce a new cache data block state Clean owned shared COS This state can only be entered from the clean exclusive CE state The state transition from CE to COS is summarized as follows initial state other cached ops actions by this cache final state cleanExclusive CE no CR CCI …
View Full Document
Unlocking...