DOC PREVIEW
CMU CS 15740 - A Performance Comparison of Contemporary DRAM Architectures

This preview shows page 1-2-3-4 out of 12 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 12 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Copyright © 1999 IEEE. Published in theProceedings of the 26th International Symposium on Computer Architecture, May 2-4, 1999, in Atlanta GA, USA. Personal use of this material is per-mitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or toreuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O.Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: + Intl. 908-562-3966.ABSTRACTIn response to the growing gap between memory access time andprocessor speed, DRAM manufacturers have created several newDRAM architectures. This paper presents a simulation-based per-formance study of a representative group, each evaluated in a smallsystem organization. These small-system organizations correspondto workstation-class computers and use on the order of 10 DRAMchips. The study covers Fast Page Mode, Extended Data Out, Syn-chronous, Enhanced Synchronous, Synchronous Link, Rambus, andDirect Rambus designs. Our simulations reveal several things: (a)current advanced DRAM technologies are attacking the memorybandwidth problem but not the latency problem; (b) bus transmis-sion speed will soon become a primary factor limiting memory-sys-tem performance; (c) the post-L2 address stream still containssignificant locality, though it varies from application to application;and (d) as we move to wider buses, row access time becomes moreprominent, making it important to investigate techniques to exploitthe available locality to decrease access time.1 INTRODUCTIONIn response to the growing gap between memory access time andprocessor speed, DRAM manufacturers have created several newDRAM architectures. This paper presents a simulation-based perfor-mance study of a representative group, evaluating each in terms ofits effect on total execution time. We simulate the performance ofseven DRAM architectures: Fast Page Mode [35], Extended DataOut [16], Synchronous [17], Enhanced Synchronous [10], Synchro-nous Link [38], Rambus [31], and Direct Rambus [32]. While thereare a number of academic proposals for new DRAM designs, spacelimits us to covering only existent commercial parts. To obtain accu-rate memory-request timing for an aggressive out-of-order proces-sor, we integrate our code into the SimpleScalar tool set [4].This paper presents a baseline study of a small-system DRAMorganization: these are systems with only a handful of DRAM chips(0.1–1GB). We do not consider large-system DRAM organizationswith many gigabytes of storage that are highly interleaved. Thestudy asks and answers the following questions:• What is the effect of improvements in DRAM technology on thememory latency and bandwidth problems?Contemporary techniques for improving processor performanceand tolerating memory latency are exacerbating the memorybandwidth problem [5]. Our results show that current DRAMarchitectures are attacking exactly this problem: the most recenttechnologies (SDRAM, ESDRAM, and Rambus) have reducedthe stall time due to limited bandwidth by a factor of threecompared to earlier DRAM architectures. However, thememory-latency component of overhead has not improved.• Where is time spent in the primary memory system (the memorysystem beyond the cache hierarchy, but not including secondary[disk] or tertiary [backup] storage)? What is the performancebenefit of exploiting the page mode of contemporary DRAMs?For the newer DRAM designs, the time to extract the requireddata from the sense amps/row caches for transmission on thememory bus is the largest component in the average access time,though page mode allows this to be overlapped with columnaccess and the time to transmit the data over the memory bus.• How much locality is there in the address stream that reaches theprimary memory system?The stream of addresses that miss the L2 cache contains asignificant amount of locality, as measured by the hit-rates in theDRAM row buffers. The hit rates for the applications studiedrange 8–95%, with a mean hit rate of 40% for a 1MB L2 cache.(This does not include hits to the row buffers when makingmultiple DRAM requests to read one cache-line.)We also make several observations. First, there is a one-time trade-off between cost, bandwidth, and latency: to a point, latency can bedecreased by ganging together multiple DRAMs into a wide struc-ture. This trades dollars for bandwidth that reduces latency becausea request size is typically much larger than the DRAM transferwidth. Page mode and interleaving are similar optimizations thatwork because a request size is typically larger than the bus width.However, the latency benefits are limited by bus and DRAM speeds:to get further improvements, one must run the DRAM core and busat faster speeds. Current memory busses are adequate for small sys-tems but are likely inadequate for large ones. Embedded DRAM [5,19, 37] is not a near-term solution, as its performance is poor onhigh-end workloads [3]. Faster buses are more likely solutions—wit-ness the elimination of the slow intermediate memory bus in futuresystems [12]. Another solution is to internally bank the memoryarray into many small arrays so that each can be accessed veryquickly, as in the MoSys Multibank DRAM architecture [39].Second, widening buses will present new optimization opportu-nities. Each application exhibits a different degree of locality andtherefore benefits from page mode to a different degree. As buseswiden, this effect becomes more pronounced, to the extent that dif-ferent applications can have average access times that differ by 50%.This is a minor issue considering current bus technology. However,future bus technologies will expose the row access as the primaryperformance bottleneck, justifying the exploration of mechanisms toexploit locality to guarantee hits in the DRAM row buffers: e.g. row-buffer victim caches, prediction mechanisms, etc.Third, while buses as wide as the L2 cache yield the best mem-ory latency, they cannot halve the latency of a bus half as wide. Pagemode overlaps the components of DRAM access when making mul-tiple requests to the same row. If the bus is as wide as a request, oneA Performance Comparison of Contemporary DRAM ArchitecturesVinodh Cuppu, Bruce Jacob Brian Davis, Trevor MudgeDept. of Electrical & Computer Engineering


View Full Document

CMU CS 15740 - A Performance Comparison of Contemporary DRAM Architectures

Documents in this Course
leecture

leecture

17 pages

Lecture

Lecture

9 pages

Lecture

Lecture

36 pages

Lecture

Lecture

9 pages

Lecture

Lecture

13 pages

lecture

lecture

25 pages

lect17

lect17

7 pages

Lecture

Lecture

65 pages

Lecture

Lecture

28 pages

lect07

lect07

24 pages

lect07

lect07

12 pages

lect03

lect03

3 pages

lecture

lecture

11 pages

lecture

lecture

20 pages

lecture

lecture

11 pages

Lecture

Lecture

9 pages

Lecture

Lecture

10 pages

Lecture

Lecture

22 pages

Lecture

Lecture

28 pages

Lecture

Lecture

18 pages

lecture

lecture

63 pages

lecture

lecture

13 pages

Lecture

Lecture

36 pages

Lecture

Lecture

18 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

lecture

lecture

34 pages

lecture

lecture

47 pages

lecture

lecture

7 pages

Lecture

Lecture

18 pages

Lecture

Lecture

7 pages

Lecture

Lecture

21 pages

Lecture

Lecture

10 pages

Lecture

Lecture

39 pages

Lecture

Lecture

11 pages

lect04

lect04

40 pages

Load more
Download A Performance Comparison of Contemporary DRAM Architectures
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view A Performance Comparison of Contemporary DRAM Architectures and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view A Performance Comparison of Contemporary DRAM Architectures 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?