The Memory Hierarchy – Part I The slides of Part I are taken in large part from V. Heuring & H. Jordan, “Computer SyMemory Hierarchy Outline (1)Memory Hierarchy Outline (2)Memory Technology CharacteristicsAMD AthlonTypical Disk Drive: SATA 750GbMemory Performance GapLevels of the Memory HierarchyThe CPU–Memory InterfaceThe CPU–Memory Interface (cont’d.)Memory Performance ParametersMemories: Basic TechnologiesMemory Cell Structure: Basic D-LatchAn 8-Bit Register as a 1-D RAM ArrayA 4 x 8 2-D Memory Cell ArrayA 64 K x 1 Static RAM ChipA 16 K x 4 SRAM ChipMatrix & Tree Decoders6-Transistor Static RAM CellStatic RAM Read OperationStatic RAM Write OperationsExample Commercial ProductDynamic RAM OrganizationDRAM Chip OrganizationDRAM Read and Write CyclesDRAM Refresh & Row AccessDRAM Commercial ProductA 2-D CMOS ROM ChipROM TypesMemory Boards and ModulesGeneral Structure of a Memory ChipWord Assembly from Narrow ChipsIncreasing the Num. of Words by a Factor of 2kChip Using 2 Chip Selects3-Dimensional Dynamic RAM ArrayA Memory Module and Its InterfaceDynamic RAM Module with Refresh ControlTwo Kinds of Memory Module OrganizationsTiming Advantage of Interleaving1S07 Mark FranklinChapter 7The Memory Hierarchy –Part IThe slides of Part I are taken in large part from V. Heuring & H. Jordan, “Computer Systems Design and Architecture” .2S07 Mark FranklinMemory Hierarchy Outline (1)• Memory components:– RAM memory cells & cell arrays.– Static RAM—more expensive, but less complex.– Tree and matrix decoders—needed for large RAM chips.– Dynamic RAM—less expensive, but needs “refreshing”• Chip organization• Timing– ROM—Read-only memory.• Memory boards– Arrays of chips give more addresses and/or wider words.– 2-D and 3-D chip arrays.• Memory modules– Large systems can benefit by partitioning memory for:• separate access by system components.• fast access to multiple words.3S07 Mark FranklinMemory Hierarchy Outline (2)• The Memory Hierarchy: from fast & expensive to slow & cheap:– Registers → Cache → Main Memory → Disk– Cache: High speed, expensive (1stlevel on-chip, 2ndlevel off-chip)• Design Types: Direct mapped, associative, set associative– Virtual memory: Makes the hierarchy to disk transparent• Address translation: logical address Æ physical address• Memory management — control of information movement between levels.• Multiprogramming, multithreading — computation while waiting formemory Æ improve efficiency and resource utilization.• The “TLB”: For speeding up the address translation process.• Memory as a subsystem: Overall performance.4S07 Mark FranklinMemory Technology CharacteristicsPage4KB-16KB~ 1,000Gb5 – 10msDisk3Record16KB> 1,000Gb1 – 5secMagneticTape4Cache line8B-32B2MB - 32GB40 – 200nsMainMemory2Word16-64bits8KB - 8MB.25 – 10nsCache (on-chip)1Unit of Transfer(Block Size)TypicalSizeAverageAccess TimeMemoryTypeLevel5S07 Mark FranklinAMD Athlon6S07 Mark FranklinTypical Disk Drive: SATA 750Gb7S07 Mark FranklinMemory Performance GapProcessor-DRAM Memory Gap (latency)µProc60%/yr.(2X/1.5yr)DRAM9%/yr.(2X/10 yrs)110100100019801981198319841985198619871988198919901991199219931994199519961997199819992000DRAMCPU1982Processor-MemoryPerformance Gap:(grows 50% / year)PerformanceTime“Moore’s Law”8S07 Mark FranklinLevels of the Memory HierarchyCapacity, Access Time, CostUpper LevelStagingXfer UnitfasterRegistersCPU Registers100s Bytes<2s nsCache100s K Bytes.3 - 2 ns1-0.1 cents/bitMain Memory1000s M Bytes100ns- 400ns$.0001-.00001 cents /bitDisk100sG Bytes, 10 ms 10-6 -10-7cents/bitTapeinfinitesec-min10-8 cents/bitCacheMemoryDiskTapeprog./compiler1-8 bytesInstr. Operandscache 16 - 256 bytesBlocksOS512-8K bytesPagesuser/operatorMbytesFilesLargerLower Level9S07 Mark FranklinThe CPU–Memory InterfaceSequence of events:Read:1. CPU loads MAR & thenissues Read & REQUEST.2. Main memory transmitswords to MDR.3. Main memory assertsCOMPLETE.Write:1. CPU loads MAR & MDR, asserts Write & REQUEST.2. (MDR) Æ (MAR).3. Main memory assertsCOMPLETE.CPUmMain memoryAddress busData bussAddress01232m–1A0–Am– 1D0–Db–1R/WREQUESTCOMPLETEMDRRegisterfileControl signalsmwwMARb111-22+10S07 Mark FranklinThe CPU–Memory Interface (cont’d.)Additional points:• Multiple Data Transfers: If b < w Æ w/b b-bit transfers.• Fractional Word Transfers: Some CPUs. Can transfer < w bits.Exp: Intel 8088: m = 20; 8- and 16-bit values can be read & written• For very fast memory, or known response time, then COMPLETE may be omitted.• Separate R & W lines may be used, & REQUEST line omitted.CPUmMain memoryss busssAddress01232m–1A0–AmÐ1D0–DbÐ1R/WREQUESTCOMPLETEMDRRegisterfilemwwMARbData bu AddreControl signals11S07 Mark FranklinMemory Performance ParametersSymbol Definition Units MeaningtaAccess timetime Time to access a memory word.tcCycle time time Time from start of access to startof next access.k Block size words Number of words per block.ω Bandwidth words/time Word transmission rate.tlLatency time Time to access first word of a sequence of words.tbl= Block time Time to access an entire blockof words.tl+ k/ω access time(Information is stored & moved in blocks at the cache & disk level.)12S07 Mark FranklinMemories: Basic Technologies• SRAM:– value is stored on a pair of inverting gates– very fast but takes up more space than DRAM (4 to 6 transistors)• Cross Coupled gates (more later)• DRAM:– value is stored as a charge on capacitor (must be refreshed)– very small (hence higher density) but slower than SRAM (factor of 5 to 10)Word l i nePass t r ansi st orCapaci torBi t l i ne(Selects bit)(Writes/reads bit)13S07 Mark FranklinMemory Cell Structure: Basic D-LatchRAM memory cells must provide:1) Select 2) DataIn 3) DataOut 4)R/W.SelectDataOutDataInR/W14S07 Mark FranklinAn 8-Bit Register as a 1-D RAM ArrayThe entire register is selected with one select line, and uses one R/W line.SelectDataInDataOutR/Wd0SelectR/Wd1d2d3d4d5d6d7DDDDDDDDD15S07 Mark FranklinA 4 x 8 2-D Memory Cell ArrayR/W is commonto all2-bitaddressBidirectional 8-bit buffered data bus2-4 line decoder selects one of the four 8-bit arraysd0R/Wd1d2d3d4d5d6d7DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD2– 4decoderA1A0How does this scale?16S07 Mark FranklinA 64 K x 1 Static RAM Chip~square array fits ICDesign paradigmSelect rows separately fromcolumns Æ 256 x 2 = 512 circuit logic componentsinstead of 65,536 !CS, Chip
View Full Document