DOC PREVIEW
U of U CS 7810 - DRAM, PCM

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 141Lecture 14: DRAM, PCM• Today: DRAM scheduling, reliability, PCM• Class projects2TCM Kim et al., MICRO 2010• Organize threads into latency-sensitive ad bw-sensitive clusters based on memory intensity; former gets higher priority• Within bw-sensitive cluster, priority is based on rank• Rank is determined based on “niceness” of a thread and the rank is periodically shuffled with insertion shuffling or random shuffling (the former is used if there is a big gap in niceness)• Threads with low row buffer hit rates and high bank level parallelism are considered “nice” to others3ECC• For a BCH code, to correct t errors in k-bit data, need an r-bit code, r = t * ceil (log2 k) + 1• For DRAM, typically, an 8-bit ECC (Hamming) code is attached to every 64-bit word; can recover from a single bit corruption• Chipkill correct systems can withstand failure of an entire DRAM chip• For chipkill correctness the 72-bit word must be spread across 72 DRAM chips or, a 13-bit word (8-bit data and 5-bit ECC) must be spread across 13 DRAM chips4RAID-like DRAM Designs• DRAM chips do not have built-in error detection• Can employ a 9-chip rank with ECC to detect and recover from a single error; in case of a multi-bit error, rely on a second tier of error correction• Can do parity across DIMMs (needs an extra DIMM); use ECC within a DIMM to recover from 1-bit errors; use parity across DIMMs to recover from multi-bit errors in 1 DIMM • Reads are cheap (must only access 1 DIMM); writes are expensive (must read and write 2 DIMMs) Used in some HP servers5RAID-like DRAM Udipi et al., ISCA’10• Add a checksum to every row in DRAM; verified at the memory controller• Adds area overhead, but provides self-contained error detection• When a chip fails, can re-construct data by examining another parity DRAM chip• Can control overheads by having checksum for a large row or one parity chip for many data chips• Writes are again problematic6Virtualized ECC Yoon and Erez, ASPLOS’10• Also builds a two-tier error protection scheme, but does the second tier in software• The second-tier codes are stored in the regular physical address space (not specialized DRAM chips); software has flexibility in terms of the types of codes to use and the types of pages that are protected• Reads are cheap; writes are expensive as usual; but, the second-tier codes can now be cached; greatly helps reduce the number of DRAM writes7Phase Change Memory• Emerging NVM technology that can replace Flash and DRAM• Much higher density; much better scalability; can do multi-level cells• When materials (GST) are heated (with electrical pulses) and then cooled, they form either crystalline or amorphous materials depending on the intensity and duration of the pulses; crystalline materials have low resistance (1 state) and amorphous materials have high resistance (0 state)• Non-volatile, fast reads (~50ns), slow and energy-hungry writes; limited lifetime (~10 writes per cell), no leakage88Optimizations for Writes (Energy, Lifetime)• Read a line before writing and only write the modified bits Zhou et al., ISCA’09• Write either the line or its inverted version, whichever causes fewer bit-flips Cho and Lee, MICRO’09• Only write dirty lines in a PCM page (when a page is evicted from a DRAM cache) Lee et al., Qureshi et al., ISCA’09• When a page is brought from disk, place it only in DRAM cache and place in PCM upon eviction Qureshi et al., ISCA’09• Wear-leveling: rotate every new page, shift a row periodically, swap segments Zhou et al., Qureshi et al., ISCA’099Hard Error Tolerance in PCM• PCM cells will eventually fail; important to cause gradual capacity degradation when this happens• Pairing: among the pool of faulty pages, pair two pages that have faults in different locations; replicate data across the two pages Ipek et al., ASPLOS’10• Errors are detected with parity bits; replica reads are issued if the initial read is faulty10ECP Schechter et al., ISCA’10• Instead of using ECC to handle a few transient faults in DRAM, use error-correcting pointers to handle hard errors in specific locations• For a 512-bit line with 1 failed bit, maintain a 9-bit field to track the failed location and another bit to store the value in that location• Can store multiple such pointers and can recover from faults in the pointers too• ECC has similar storage overhead and can handle soft errors; but ECC has high entropy and can hasten wearout11SAFER Seong et al., MICRO 2010• Most PCM hard errors are stuck-at faults (stuck at 0 or stuck at 1)• Either write the word or its flipped version so that the failed bit is made to store the stuck-at value• For multi-bit errors, the line can be partitioned such that each partition has a single error• Errors are detected by verifying a write; recently failed bit locations are cached so multiple writes can be avoided12FREE-p Yoon et al., HPCA 2011• When a PCM block is unusable because the number of hard errors has exceeded the ECC capability, it is remapped to another address; the pointer to this address is stored in the failed block• The pointer can be replicated many times in the failed block to tolerate the multiple errors in the failed block• Requires two accesses when handling failed blocks; this overhead can be reduced by caching the pointer at the memory controller13Title• Bullet14Title•


View Full Document

U of U CS 7810 - DRAM, PCM

Download DRAM, PCM
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view DRAM, PCM and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view DRAM, PCM 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?