UMD CMSC 411 - Lecture 16 Memory Hierarchy 3 - D2777607

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 411> Lecture 16 Memory Hierarchy 3

DOC PREVIEW

UMD CMSC 411 - Lecture 16 Memory Hierarchy 3

School name University of Maryland, College Park

Course Cmsc 411- Computer Systems Architecture

Pages 9

This preview shows page 1-2-3 out of 9 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 9 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Main memory management Questions How big should main memory be How to handle reads and writes How to find something in main memory How to decide what to put in main memory If main memory is full how to decide what to replace CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 Main Memory Virtual Memory CMSC 411 14 some from Patterson Sussman others The scale of things Memory hardware Typically as of 2000 Registers 1 KB access time 25 5 ns Cache 8 MB access time 5 25 ns Main Memory 4 GB access time 150 250 ns Disk Storage 30 GB access time 5 000 000 ns 5ms Memory Technology CMOS Complementary Metal Oxide Semiconductor uses a combination of n and p doped semiconductor material to achieve low power dissipation DRAM dynamic random access memory typically used for main memory One transistor per data bit Each bit must be refreshed periodically e g every 8 milliseconds so maybe 5 of time is spent in refresh Access time cycle time Address sent in two halves so that fewer pins are needed on chip row and column access CMSC 411 14 some from Patterson Sussman others CMSC 411 14 some from Patterson Sussman others Memory hardware cont Bottleneck SRAM static random access typically used for cache memory 4 6 transistors per data bit No need for refresh Access time cycle time Address sent all at once for speed Main memory access will slow down the CPU unless the hardware designer is careful Some techniques can improve memory bandwidth the amount of data that can be delivered from memory in a given amount of time wider main memory interleaved memory independent memory banks avoiding memory bank conflicts CMSC 411 14 some from Patterson Sussman others CS252 S05 3 5 CMSC 411 14 some from Patterson Sussman others 2 4 6 Wider main memory Wider main memory cont Wider cache lines Cache miss If a cache block contains k words then each cache miss involves these steps repeated k times Extra costs a wider memory bus hardware to deliver 32n bits in parallel instead of 32 bits a multiplexer to choose the correct 32 bits to transmit from the cache to the CPU Send the address to main memory Access the word i e locate it Send the word to cache with the bits transmitted in parallel Idea behind wider memory the user thinks about 32 bit words but physical memory can have longer words Then the operations above are done only k n times where n is the number of 32 bit words in a physical word 7 CMSC 411 14 some from Patterson Sussman others Interleaved memory Interleaved memory cont Partition memory into banks with each bank able to access a word and send it to cache in parallel Organize address space so that adjacent words live in different banks called interleaving For example 4 banks might have words with the following octal addresses Note how nice interleaving is for write through Also helps speed read and write back Note Interleaved memory acts like wide memory except that words are transmitted through the bus sequentially not in parallel Bank 0 Bank 1 Bank 2 Bank3 00 01 02 03 04 05 06 07 10 11 12 13 CMSC 411 14 some from Patterson Sussman others 9 CMSC 411 14 some from Patterson Sussman others Independent memory banks Avoid memory bank conflicts Each bank of memory has its own address lines and usually a bus Can have several independent banks perhaps one for instructions one for data this is called a Harvard architecture Banks can operate independently without slowing others By having a prime number of memory banks Since arrays frequently have even dimension sizes and often dimension sizes that are a power of 2 strides that match the number of banks or a multiple give very slow access CMSC 411 14 some from Patterson Sussman others CS252 S05 CMSC 411 14 some from Patterson Sussman others 11 CMSC 411 14 some from Patterson Sussman others 8 10 12 How much good do these techniques do Example Interleaving int x 256 512 for j 0 j 512 j j 1 for i 0 i 256 i i 1 x i j 2 x i j First access the first column of x x 0 0 x 1 0 x 2 0 x 255 0 with addresses K K 512 4 K 512 8 K 512 4 255 With 4 memory banks all of the elements live in the same memory bank so the CPU will stall in the worst possible way CMSC 411 14 some from Patterson Sussman others CMSC 411 14 some from Patterson Sussman others Virtual addressing Memory protection Computers are designed so that multiple programs can be active at the same time At the time a program is compiled the compiler has to assign addresses to each data item But how can it know what memory addresses are being used by other programs Instead the compiler assigns virtual addresses and expects the loader OS to provide the means to map these into physical addresses Each program lives in its own virtual space called its process When the CPU is working on one process others may be partially completed or waiting for attention The CPU is time shared among the processes working on each in turn And main memory is also shared among processes CMSC 411 15 some from Patterson Sussman others 15 CMSC 411 15 some from Patterson Sussman others 14 16 In the olden days In the less olden days The loader would locate an unused set of main memory addresses and load the program and data there There would be a special register called the relocation register and all addresses that the program used would be interpreted as addresses relative to the base address in that register So if the program jumped to location 54 the jump would really be to 54 contents of relocation register A similar thing perhaps with a second register would happen for data references It became difficult to find a contiguous segment of memory big enough to hold program and data so the program was divided into pages with each page stored contiguously but different pages in any available spot either in main memory or on disk This is the virtual addressing scheme to the program memory looks like a contiguous segment but actually data is scattered in main memory and perhaps on disk CMSC 411 15 some from Patterson Sussman others CS252 S05 13 Example Assume a cache block of 4 words and 4 cycles to send address to main memory 24 cycles to access a word once the address arrives 4 cycles to send a word back to cache Basic miss penalty 4 32 128 cycles since each of 4 words has the full 32 cycle penalty Memory with a 2 word width 2 32 4 72 cycle miss penalty Simple interleaved memory address can be sent to each bank simultaneously so miss penalty is 4 24 4 4 for sending words 44 cycles Independent memory banks 32 cycle miss penalty as long

View Full Document