O-K-State ECEN 6253 - Multiprocessors - D2005199

Home> Schools> Oklahoma State University> Electrical and Computer Engineering (ECEN) > ECEN 6253> Multiprocessors

DOC PREVIEW

O-K-State ECEN 6253 - Multiprocessors

School name Oklahoma State University

Course Ecen 6253- Adv Top Comp Arch

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Multiprocessors (Cont.)Multi-Level Cache CoherenceInclusion1. when filling a low level cache from a high level cache, all intermediate levels are also filled with the same entry,2. when evicting a cache entry from a high level cache, the same entry is also evicted from all lower levels of cache.Non-Inclusive Multi-Level Cache CoherenceInclusive Multi-Level Cache CoherenceMemory ConsistencySequential ConsistencyRelaxed ConsistencyOn Chip Coherent Memory InterfaceECEN 6253 Advanced Digital Computer Design Multiprocessors (Cont.) April 16, 2006 page 1 of 4Multiprocessors (Cont.)Multi-Level Cache CoherenceModern processors all incorporate multiple levels of cache to minimize the average latency in response to (local) processor memory requests. In multiprocessor systems made with these processors, each local processor has its own multi-level cache. We then have the problem of maintaining cache coherence between all the levels of cache for all the processors.Inclusion. A cache hierarchy is inclusive if all entries in low level (close to the processor) cache are also included in the entrees of all higher (farther from the processor) levels of cache. Another way of saying this is that the entries in lower level caches are subsets of the entries in higher level caches. Inclusion can be maintained by1. when filling a low level cache from a high level cache, all intermediate levels are also filled with the same entry,2. when evicting a cache entry from a high level cache, the same entry is also evicted from all lower levels of cache. While most cache hierarchies implement the first requirement, meeting the second requirement implies that each level of cache must keep track of which of its entries are in the next lowest level cache. Not all cache hierarchies implement the second requirement, and they are called non-inclusive cache hierarchies.Non-Inclusive Multi-Level Cache Coherence. Each level of each processor’s cache must maintain coherence independently. For snooping implementations, each cache level must monitor the memory bus for matching addresses. For level 1 data cache, this usually requires a multi-port cache tag implementation to simultaneously keep up with both local processor requests and memory coherence checks. This is not necessary in a directory based system. However, the directory must now have entries for each level cache for each processor in the shared list.Inclusive Multi-Level Cache Coherence. Only the highest level processor cache must maintain coherence. The coherence of lower levels is assured by the inclusion mecha-nism. Here, we are assuming that the inclusion mechanism is extended to modify the sta-tus bits in lower level cache whenever changes are made to the status bits in higher level cache. Inclusion makes snooping much more practical in that only the highest level cache has to monitor the memory bus for matching address. The lower level caches are inter-rupted only for the matching entries in the cache, and do not have to continuously monitor the memory bus. Inclusion also simplifies directory based systems because there is only one entry for each processor in the processor shared list.ECEN 6253 Advanced Digital Computer Design Multiprocessors (Cont.) April 16, 2006 page 2 of 4Memory ConsistencyWe have insured that every processor sees the same shared memory values with cache coherence protocols. We have not yet insured that the “correct” value will be seen by the processors. With a single thread of execution, it was easy to decide that the correct value was as if the instructions are executed one at a time in sequential order. With multiple threads executing simultaneously (on multiple processors) with no set ordering of instruc-tions between different threads, it is no longer obvious what the “correct” value should be. Different memory models are used to provide consistent operation of the system.Sequential Consistency. As shown in fig. 11-7, p. 577, sequential consistency (SC) is defined such that multiple threads running on different processors behave as if they were time-shared on a single processor. This has the advantage that the large body of software already developed for time-shared systems should work on an SC multiprocessor system.Only load and store instructions access memory in modern instruction sets. The order of other instructions is unimportant for SC. In a uniprocessor system, loads and stores from all threads get ordered by the memory bus controller which grants access to one load/store at a time. In a multiprocessor system, the loads and stores are ordered by the shared bus controller in a snooping system or by the memory directory controller in a directory sys-tem.Strict imposition of SC would seem to require that all loads and stores from every proces-sor must be done in order. We have previously seen that single processor performance can be increased by allowing out of order loads and stores. Strict ordering is needed only for those loads and stores with data dependences. The same techniques can be used to obtain an efficient implementation of an SC multiprocessor system.For example, suppose we wish to allow load bypassing on an SC multiprocessor. Recall that the uniprocessor load bypassing, fig. 5-34, p. 270, allows early speculated execution of loads and checks each load address for matches with pending stores. Ordering is enforced by canceling the load until after the matching store is retired. An SC multipro-cessor would have to do the same, plus cancelling loads for any matches on the external address bus with writes (stores) from other processors as in fig. 11-8, p. 579. The SC memory model requires that any store appearing on the bus before the load completes must be regarded as occurring before the load. Note that the bus address must be com-pared against all of the loads in the reorder buffer.True SC requires that all loads coming after a store must “see” the same memory state that exists when the preceding store actually takes place, that is when the store is retired. We cannot allow any loads to complete before previous stores have retired. Unlike uniproces-sors, stores cannot be removed from the reorder buffer before they are retired if SC is to be maintained in a multiprocessor. This can result in significant performance loss.Relaxed Consistency. There are several relaxed consistency (RC) memory models. They allow various degrees of reordering of loads and stores from

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 4 pages.

O-K-State ECEN 6253 - Multiprocessors

Sign up for free to view:

Please select your school