New version page

A Holistic Approach to DRAM

Upgrade to remove ads

This preview shows page 1-2-3-18-19-37-38-39 out of 39 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 39 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

A HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 1 A Holistic Approach to DRAM Prof. Bruce JacobElectrical & Computer EngineeringUniversity of Maryland, College Park OUTLINE • Anecdotes, Vision• Our Past & Present Work • Anecdotes Revisited• Conclusions UNIVERSITY OF MARYLANDA HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 2 Anecdote I: System Issues 0.8 1.6 3.2 6.4 12.8 25.6 System Bandwidth 0123 Cycles per Instruction (CPI)(GB/s = Channels * Width * 800MHz) 128-Byte Burst64-Byte Burst32-Byte Burst Benchmark = GCC (SPEC 2000), 2 banks/channel 1 chan x 4 bytes2 chan x 2 bytes4 chan x 1 byte1 chan x 8 bytes2 chan x 4 bytes4 chan x 2 bytes2 chan x 8 bytes4 chan x 4 bytes4 chan x 8 bytes1 chan x 2 bytes2 chan x 1 byte1 chan x 1 byteA HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 3 Anecdote II: DDR’s DLL DRAMArraysCK EXT DQ EXT Delay D CLK of clock Data from clock input padD CLK READ CK INT CK Bufs CK EXT CK INT CMDDQ EXT to output drivers D DQ D CLK + D DQ Ideally alignedA HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 4 Anecdote II: DDR’s DLL READDelay introduced by DLL DRAMArraysCK EXT DQ EXT Delay D CLK of clock Data D CLKCK INT CK Bufs CK EXT CK INT CMDDQ EXT D DQ AlignedDLL D DLL D CLK + D DLL Point of DLL: to align DQ outputwith system clock(minimize internal skewby eliminating D CLK ) DelayedA HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 5 Anecdote II: DDR’s DLL A handful of alternatives: strobeDATADATAstrobe MC DMC D DATAstrobe MC RCLKDATA MC D DATAstrobe MC D DATAstrobe MC D DLL D DLLDLL VV DLL VV UnassistedDLL on DRAMDLL on MCDLL on moduleRead clockStatic delay w/ recalibration DIMMA HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 6 Anecdote III: Circuit v System t RRD t FAW Cmd Data Row Activation CommandColumn Read Command Internal cmd RCCCRRRRCCCCCCCCCCCCRRRCR data data data data data data data ClockCmd Internal CmdData t DQS CCCCRRRCCCCCCCCCCRRRR data data data data data data data data t RRD & t FAW limitations:t DQS limitations:A HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 7 Vision Must make circuit-level decisionsconsidering system-level ramifications (holistic approach)A HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 8 Past Work: Device-Level CPUMemory Controllerand cachesx16 DRAMx16 DRAMx16 DRAMx16 DRAMx16 DRAMx16 DRAMx16 DRAMx16 DRAM128-bit 100MHz bus DIMM FPM, EDO, SDRAM, ESDRAM, DDR: Fast, Narrow Channel CPUMemory Controllerand caches128-bit 100MHz busDRAMDRAMDRAMDRAMDRAMDRAMDRAMDRAM Rambus, Direct Rambus, SLDRAM: [Cuppu et al. ISCA 1999]A HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 9 Past Work: Device-Level Average Latencies 0100200300400500Avg Time per Access (ns) DRAM ArchitectureBus Transmission TimeRow Access TimeColumn Access TimeData Transfer Time OverlapData Transfer TimeRefresh TimeBus Wait Time FPM EDO DRDRAM ESDRAM DDRSLDRAM RDRAM SDRAM PERL Critical wordarrival times Newer DRAMs [Cuppu et al. ISCA 1999]A HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 10 Past Work: Device-Level Bandwidth-Enhancing Techniques I: FPM EDO DRDRAM ESDRAM DDRSLDRAM RDRAM SDRAM 012345Cycles Per Instruction (CPI) DRAM ArchitectureYesterday’s CPUTomorrow’s CPUToday’s CPUProcessor ExecutionOverlap between Execution & MemoryStalls due to Memory LatencyStalls due to Memory Bandwidth Newer DRAMsPERL [Cuppu et al. ISCA 1999]A HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 11 Past Work: Device-Level Bandwidth-Enhancing Techniques II: FPM/interleaved EDO/interleaved SDRAM & DDR SLDRAM x1/x2 RDRAM x1/x2 012345Cycles Per Instruction (CPI) Execution Time in CPI — PERL DRAM Architecture (10GHz CPU)Processor ExecutionOverlap between Execution & MemoryStalls due to Memory LatencyStalls due to Memory Bandwidth PERL [Cuppu et al. ISCA 1999]A HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 12 Past Work: System-Level Even when we restrict our focus … 1, 2, 4 800 MHz Channels8, 16, 32, 64 Data Bits per Channel1, 2, 4, 8 Banks per Channel (Indep.)32, 64, 128 Bytes per Burst CDCD DCD DD DDCDCD DD DCD DD DD DD DCD DD DD DD DD DD DD DD DCDDDDDDDDCDD DD ... ...... One independent channelBanking degrees of 1, 2, 4, ...Four independent channelsBanking degrees of 1, 2, 4, ...Two independent channelsBanking degrees of 1, 2, 4, ...CD DCD DD DCDCD DCD DD DCD DD DD DD DCD DD DD DD DD DD DD DD DCDDDDDDDDCDD DD [Cuppu & Jacob ISCA 2001]A HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 13 Past Work: System-Level ... the design space is FAR from regular … 0.8 1.6 3.2 6.4 12.8 25.6 System Bandwidth 0123Cycles per Instruction (CPI) (GB/s = Channels * Width * 800MHz) 1 chan x 4 bytes2 chan x 2 bytes4 chan x 1 byte1 chan x 8 bytes2 chan x 4 bytes4 chan x 2 bytes2 chan x 8 bytes4 chan x 4 bytes4 chan x 8 bytes1 chan x 2 bytes2 chan x 1 byte1 chan x 1 byte GCC 128-Byte Burst64-Byte Burst32-Byte Burst [Cuppu & Jacob ISCA 2001]A HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 14 Past Work: System-Level ... and the cost of poor judgment is high. bzip gcc mcf parser perl vpr average SPEC 2000 Benchmarks 0246810 Cycles per Instruction (CPI) Best OrganizationAverage OrganizationWorst Organization [Cuppu & Jacob ISCA 2001]A HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 15 An Aside Past work used first-order models.Present work uses models accurate to second & third order effects …A HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 16 [ Definition: Zero’th Order ] ...if ( INSTR.is_loadstore ) {if (L1_cache_miss( INSTR.daddr )) {if (L2_cache_miss( INSTR.daddr )) {cycles += DRAM_LATENCY;ORINSTR.ready = now() + DRAM_LATENCY;}}}...A HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 17 An Aside Past work used first-order models.Present work uses models accurate to second & third order effects …A HOLISTICAPPROACH to DRAM Bruce JacobUniversity of Maryland SLIDE 18 Past & Present Work IEEETC 1996: System-level analytical tool for cost/performance ISCA 1999, IEEETC 2001: DRAM device-level characterization CASES 2001, IEEETC 2003: Performance & energy modeling of CPU and SRAM (model executes unmodified RTOS) ISCA 2001: DRAM system-level characterization SPIE 2005: SystemC modeling of energy in systems-on-chip CPU $ IEEETC 1996 ISCA 1999, IEEETC 2001


Download A Holistic Approach to DRAM
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view A Holistic Approach to DRAM and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Holistic Approach to DRAM 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?