U of U CS 7810 - Memory Access Protocols - D2517763

Home> Schools> University of Utah> Computer Science (CS) > CS 7810> Memory Access Protocols

DOC PREVIEW

U of U CS 7810 - Memory Access Protocols

School name University of Utah

Course Cs 7810- Advanced Computer Architecture

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

P a g e ‹ # ›1CS7810School of ComputingUniversity of UtahDRAMMemory Access Protocolsdevelop generic model for thinking about timingReference: “Memory Systems: Cache,DRAM, Disk” & Micron websiteBruce Jacob, Spencer Ng, & David WangToday’s material & any uncredited diagramcame from chapter 112CS7810School of ComputingUniversity of UtahGeneric StructureRead sequenceWrite: reverse 2,3,4P a g e ‹ # ›3CS7810School of ComputingUniversity of UtahAbstract Command Structure• Reality huge variety of command sequences possible» all with heavily constrained timing issues• 2 roles of timing– 1) physical: latency, set-up and hold, signal integrity, lane retiming– 2) power: limit concurrency to stay under thermal/power ceiling• Start simple command & phase overlapduration of multiple bank resource usagephase 2 durationCMD durationNote other overlaps - also specified by timing parameters4CS7810School of ComputingUniversity of UtahRow Access Command• Row activation move data from the mats to sense amps and restore themats» controlled by 2 timing parameters• tRCD - row command delay– time to move the data from the mats to the sense amps– after a RAS command + tRCD: column reads or writes can commence• tRAS - interval between a RAS command and row restore– after a RAS command + tRAS sense amps can be precharged to activateanother rowP a g e ‹ # ›5CS7810School of ComputingUniversity of UtahColumn Read Command• Bank specific move data from sense amps through I/O’s to the Mem_Ctlr» 3 timing parameters• tCAS (or tCL) - column address strobe– time between col-rd (CAS) command and data valid on the data bus– DDRx devices do this in short continuous bursts• tCCD - minimum column to column command delay due to burst I/Ogating– 1 cycle for DDR, 2 cycles for DDR2, 4 cycles for DDR3, etc.• tBURST - duration of the data burst on the busNote: some devices havetCCD>tBurst where tCCD becomes the limiting factorin what can happen next6CS7810School of ComputingUniversity of UtahColumn Write Command• Move data from mem_ctrl to sense amps timing parameters» tCWD - delay between col-write and data valid on bus from mem-ctrlr• some per device differences differences– SDRAM: tCWD is typically 0– DDR - typically 1 memory clock cycle– DDR2 - tCAS - 1 cycle– DDR3 - tCWD is programmable» Other parameters control a subsequent command’s timing• tWTR - write to read delay– end of write data burst to column read command delay• tWR - write recovery delay– min. interval between end of a write data burst and start of a prechargecommand– I/O gating allowed to overdrive sense amps prior to col-rd-cmdn (matrestore)• tCMD - time command occupies command busP a g e ‹ # ›7CS7810School of ComputingUniversity of UtahColumn Write Overview8CS7810School of ComputingUniversity of UtahPrecharge Command• Basic sequence precharge --> RAS --> (CAS R/W)* -- precharge ….• Timing constraints tRP - row precharge delay» time delay between precharge and row access command tRC - row cycle time» tRC = tRAS+tRP» limits independent row access commands in same bankP a g e ‹ # ›9CS7810School of ComputingUniversity of UtahRefresh• Necessary evil of 1T1C DRAM density advantage +: density improves $/bit» but the T is not a perfect switch due to leakage -: parasitic» power, bandwidth, and resource availability• Refresh approach varies options exist to reduce 1 of the parasitic effects» total refresh power will be constant• reduced peak power of the device has some options typical» concurrent row precharge in all of the device’s banks• mem_ctlr issues periodic refresh commands• most devices contain row precharge address counter– holds addr. of last precharged row• tRFC - refresh cycle time– duration between refresh commands and an activation (RAS) command10CS7810School of ComputingUniversity of UtahRefresh Overview• Typical refresh model is block refresh refresh entire device all at once» avoids trying to be smart & associated control complexity» refresh counter wraps to 0 to indicate doneP a g e ‹ # ›11CS7810School of ComputingUniversity of UtahRefresh Trends• tRFC is going up decreases availability ==> slower system memory vendor choice» keep inside the 64 ms refresh period• even though the number of rows goes upFamily VddDevice Capacity Mb# Banks # RowsRow Size kBRefresh CounttRC ns tRFC nsDDR 2.5V 256 4 8192 1 8192 60 67512 4 8192 2 8192 55 70DDR2 1.8V 256 4 8192 1 8192 55 75512 4 16384 1 8192 55 1051024 8 16384 1 8192 54 127.52048 8 32768 1 8192 ~ 197.54096 8 65536 1 8192 ~ 327.512CS7810School of ComputingUniversity of UtahOther Refresh Options• All have control overhead usually pushed to memory controller» since device vendors need to minimize $/bit• device could do it– classic cost-performance dilemma• Separate bank refresh allow a bank to be refreshed» while other bank accesses are still allowed• bandwidth win since memory bus can still be active• peak power win since 1 RAS on command bus at a time• mem_ctlr schedule gets harder next step» only refresh what is going to expire• huge scheduling problem - probably too hardP a g e ‹ # ›13CS7810School of ComputingUniversity of UtahEffects of Variable Command Sequences• Significant performance variation• Best case read everything in a row and move to next row» 1-2 kB in a row - lots of energy expended• pass 64-128 B cache-lines to the mem_ctlr• access all 8-32 cache lines before opening another row in same bank– low probability– observed trend: as core # increases, $ lines/row approaches 1open page memory systems - typical» keep row buffer open hoping for the best• w/ additional energy cost• Worst case Precharge -> RAS -> single CAS --> precharge ….closed page memory systems» expect the worst but why not make the row smaller?14CS7810School of ComputingUniversity of UtahRead and Write SequencesNote: % of time data bus bandwidth is utilizedP a g e ‹ # ›15CS7810School of ComputingUniversity of UtahCompound Commands• DRAM evolution allows compound commands» mem_ctlr options and scheduling complexity increase column read and precharge» use when next scheduled access is to a new row• 2 commands rather than 3• timing constraints carried over however16CS7810School of ComputingUniversity of UtahOther DDR2 Trends• tRAS lockout internal

View Full Document