1Page1MemoryHierarchyandCacheDesign(3)MainMemoryBackground• ConventionalDRAMsystemMainMemoryBackground• PerformanceMetricsofMainMemory:– Latency:Affectscachemisspenalty» AccessTime:timebetweentherequestandwhenthedesiredwordarrives» CycleTime:minimumtimebetweenrequests– Bandwidth:AffectsI/Operformance&cachemisspenalty(especiallywhenalargeblockisusedintheL2cache)MainMemoryBackground• MainMemoryusesDRAM(dynamicRAM)– Dynamicbecauseoftheneedtoberefreshedperiodically(butrequiresonly1transistor/bit)– Addressesaredividedinto2parts:» RASorRowAccessStrobe» CAS orColumnAccessStrobe• CacheusesSRAM (staticRAM)– Norefresh(butrequires6transistors/bit)– Addressnotdividedforfastaccess• Comparison– Capacity:DRAMis4-8timesthatofSRAM– Cycletime:SRAMis8-16timesfasterthanDRAM– Cost:SRAMis8-16timesmoreexpensivethanDRAM2Page2TrendsinDRAMCapacityimprovesby60%peryearRowaccesstimeimprovesby7%peryearMainMemoryOrganizations• BasicMemoryOrganization– one-wordwidebus– 4clockcyclestosendtheaddress– 24clockcyclesfortheaccesstimeperword– 4clockcyclestosendawordofdata• Example– cacheblockof4words– misspenalty=4x(4+24+4)=128cyclesFasterMemorySystem1.WiderMainMemory2.SimpleInterleavedMemory3.IndependentMemoryBanks4.AvoidingMemoryBankConflicts5.DRAM-specificInterleavingFirstTechnique:WiderMainMemory• Cachemisspenalty– two-wordwidebus» 2x(4+24+4)=64cycles– four-wordwidebus» 1x(4+24+4)=32cycles• Drawbacks– Higherbuscosts– Multiplexor– Reducedexpandability– Morefrequent“read-modify-write”sinmemorieswitherrorcorrection3Page3SecondTechnique:SimpleInterleavedMemory• MemoryconsistsofseveralDRAMChips– Eachchipiscapableofautonomousoperation• Organizememorychipsinbanksandissuememoryrequeststoallbanksatthesametime• BanksareonewordwideFourwayinterleavingMemoryInterleaving• Mappingaddressestobanksaffectsthebehaviorofthememorysystem– Optimizedforsequentialaccess• Mayspreadconsecutiveaddressestoseveralbanks– InterleavingFactor– Normallywordinterleaved» Canalsobebyteinterleaved» Dependsontheorganizationofthebank• Goal:deliverinformationfromnewbankoneachcycle– Needmorebanksthanthenumberofcyclestoaccessabank• Asmemorychipsizeincreases- usefewerchips– Constructingmultiplebanksbecomesdifficult• RestrictedExpandability– Canincreasememoryonlybydoublingit• CacheMissPenalty– 4+24+4x4=44cyclesThirdTechnique:IndependentMemoryBanks• Multipleindependentbanks– Multiplememorycontrollers– Eachbankusesseparateaddressanddatalines4Page4FourthTechnique:AvoidingMemoryBankConflicts• Problemint x[256][512];for(j=0;j<512;j=j+1)for(i=0;i<256;i=i+1)x[i][j]=2*x[i][j];– Evenwith128banksthereareconflicts,since512isamultiple of128• Softwaresolutions– loopinterchange– Resizingthearray• Hardwaresolutions– BasedontheChineseRemainderTheorem– Primenumberofbanks– banknumber=addressmodnumberofbanks– addresswithinbank=addressmodnumberofwordsinbankAvoidingMemoryBankConflicts• ExampleFifthTechnique:DRAM-SpecificInterleaving• Multiplecolumnaccesses:pagemode– DRAMbuffersarowofbitsinsidetheDRAMforcolumnaccess(e.g.,16 Kbits rowfor64 Mbits DRAM)– Allowrepeatedcolumnaccesswithoutanotherrowaccess– 64Mbit DRAM:cycletime=90ns,optimizedaccess=25ns• NewDRAMs– Example:RAMBUS» eachchipactslikeamemorysystem» usesapacket-switchedbus» issynchronoustotheCPUclock» returnsavariableamountofdata»
View Full Document